Imagination: DevOps are doomed to fail: you never scale NP problems

We live in a wonderful world: all new technologies have proven that old wisdom about avoiding NP problems was stupid.

The Travelling salesman problem? (which is not NP, I know)
Well, can't google map give you wonderful optimized routes?

And K-SAT?
What is K-sat by the way?

SAT problem is the first problem to be known as NP. (wp, youhou)

What does NP mean in computer science nowadays, that can translate in word devops/business can understand?

It cannot scale by nature.

Most devs reason as if we can always add CPU, bandwidth, memory to a computer.

The truth is the world is bounded. At least by one thing called money.

So here is what I am gonna do:
- first try to help you understand the relation ship between KSAT and dependency resolution;
- then we are gonna try to see roughly what are the underlying hidden problems;
- I am gonna tell you how we cheated so far;
- then we will show that the nature of the problem is predictably in contradiction with accepted actual business practices.

Solving the problem of knowing which package and in which order to install them given their dependency is NP complete.

The more correct way to explain this homeomorphism is here.

So: K-SAT is about the generic problem of solving boolean equation with k-parameters where some parameters maybe in fact an expression of other parameters (solution of an another equation) (that are the cycles I will talk about later).

After all, boolean are the heart of a computer, it should be easy.

Seems easy as long as the equation is a tree. And aren't all our languages based on the parsing of an AST? Don't they work? So we could write a language for it.

Well, no. Computer manipulate data. A register does not tell you what the data is. Is it one variable of my equation, its name, its value, its relation with other adresses...?

The hard stuff in computer science is to make sense of data: to make it become information by handling the context.

Installing a package on a computer is in fact building a huge graph (40k nodes on debian) and when a package is to be installed you begin by asking the first equation
Ready_to_install = Union(dependencies == satisfied)
if false then we go to the dependency solving stage

For each dependency listed in the dependency (to the nth order)
build a list of package not installed that should be required.

Plan installation of this package with the actual solutions chosen (there may be more than one way to solve your dependency, so the equation as not one but potentially N solutions).

So ... you have to evaluate them ... recursively (because parameters are solution of other equations)... then stack them ... sometimes the solution is not good, so you backtrack to another solution, modify the stack ... and sol long.

And it's over?

Bof, not really.

What if package A is installed and package B requires A+1 and A & A+1 are mutually exclusive? (small cycle ex: centos6.4 git requiers git-perl and git-perl requires git)
What if package B requires A, C , D. C requires E, E requires F and G, G requires B? This is a circular dependency or a cyclic graph.
In which order to install the package so that after every step all software still works? (I don't have the right to wipe software installed to solve a version dependency).

Where is the hic?

Variable subtitution can make the boolean equation impossible.

ex: A & B = True given A what is the value of B? easy True

The given equation is the desired state of the system package A and B should be installed.

A is True because package A is installed.

What if B = ~A ?

The equation is not solvable. Trivial case that normally don't happen.

What if B expressed requires C, D and D requires N and N is exclusive of A?
(Example A == apache, N == nginx and the software B requires nginx on 0.0.0.0:80).

Testing for cycle is easy given K determinated vertex. Finding how to check all the possibilities given all the N set of I partial solutions is quite more complex.

This is known as the DLL Hell!

That is called software requirements (see ITIL that makes a lot of fuss on this).

We already are facing small problems, but nothing that really matters. We have not talked about how we cheat, and why some are heading for a disaster.

Why do computer engineers avoid NP problems by the way?

The universe is bounded.

The data structure needed to solve the resolution dependency is a graph.

The edge are the packages (variables).
The vertex are the logical expression of software requirements (A.version > B.version)

So before talking of algorithm just one basic:
in worst case, when you add a node to a graph with n nodes you add at least n-1 vertex.

Thus the number of total relations has grown more than linearly.

You still have to store the information ... in memory (for the work to be done fast).

Then, you have to detect the cyclic references. The first order are easy.

But not always. There are ambiguity in the vertices. A requiers B.version>1.1
and C requires B.version < 2.2 may conflict if B is only available in version 1.0 and 3.0. ... so ... there is much more than what the eyes can see :)

And cycle can be bigger than the usual classical 2 exclusive packages.

But that is not all.

The algorithmic normal way to solve the equation is to create the graph. And do the systemic evaluation of the cases.

The time of computing grows in worst case explosively.

But we are not in worst case: with my «blabla» OS it takes me 3s with 4k package to install and 3s with 41k packages installed

Well, we cheat.

One part of the cheat is not going for the exact solution but given known property of real world packages KSAT solvers are optimized.

We cheat even more by relying on human beings.

Maintainers in most distributions are doing an excellent job at testing, and fixing the bugs the OS users report and make very minimal dependency.
We are in a special case where the vertex are not very dense.

The algorithm seems to scale. But ... it can't... since we are changing the domain of validity of the KSAT solver we use. Optimization that relies on : sparse connections//few requirements per software.

DevOps problematic is not ONE computer. It is a set of computers with different Operating Systems. And in house developers that ignore what packaging is all about.

So you don't have one set of equations to solve your dependencies, you have n sets. And now, the requirements may link to other sets of equations :
exemple My python program on server X requires nginx on the front end Y.

OOps, I don't have a graph of 40k nodes anymore, but 800k nodes now.
Do you want to compute the number of potential vertex with me? No. It is huge.

My sets of depencies has grown a lot. My input data in my algo have grown exponentially, so will my CPU time needed to solve the new problem.

if your apt-get install apache is 3 seconds on your ubuntu, your chef deployment will take you 3 minutes.

And, in real life, there are still people installing software from the sources without using a package manager (if that was not complex enough).

So your data are possibly not even accurate.

To sum up:
We are tending to :
- multiply the number of edges more than linearly;
- increase the number of vertices more than linearly
and feed that to an algorithm that takes exponentially more time given more input in the worst case and we tend to move towards the worst case.

The time and complexity is increasing very much.

Why old wisdom matters!

“I tend to think the drawbacks of dynamic linking outweigh the advantages for many (most?) applications.” — John Carmack

The fashion for android and OSX is to prefer statically build application. It diminishes the vertex in the graph a lot. It diminishes the software requirements.... on the front.

But smartphones and tablets are CPU/IO/battery bound very much, so we deport more and more computing in a distributed system called the cloud.

And let's zoom on the cloud system requirements.

Since we exploded the resources available on one computer we are replacing in cache memory available to more than one threads to distributed in memory cache (memcached, mongo, redis...). We are adding software requirements. We are straffing/caching/backuping data everywhere at all levels.

Since we can't serve the application on one server anymore we create cross dependencies to higher the SLA

Ex: adding a dependency on HAproxy for web applications.

For the SLA.

So your standalone computer needs no 99.9% SLA when it is shut down.

But now, since we don't know when you are gonna use it, where you are, we have to increase the backend's SLA.

By the way, SLA adds up.

My CDN is 99.9%
My heroku is 99.9%
Our's ISP is 99.9%
so my SLA is know ...between 99.9% and 99.3% yep, you forgot to add the necessary links between your CDN and heroku, and your customers ...

You need a 99.9% SLA. It is cool, it is your upper bound.

But you build a growing uncertainty for the worst case.

Or you could expect more SLA from your provider.

What is the SLA beast?

Service Level Agreement. The availability of a service over a given time on average.

99% SLA over one year ~= 3.65 days down.

Would you use still google/fb/twitter/whatever if it was down 4 day per year?

If you have a business 1% off on a critical service (like mail) you have 1% gross income less.

So ... our modern distributed technologies are aiming at 99.999%

Mathematically SLA is thus a decreasing function

And they are de facto based on increased requirements. They rely on an algorithm that is NP complete.

Mathematically resolution dependency is an exponentially time consuming function. And you are feeding more than linearly growing input.

So ....

Mathematically they are bound to intersect.

Just for memory: chef recommends 30 min per run // the equivalent of apt-get install on your computer that takes 3 to 45seconds.

These are

Availability per day per     month         per year
99.999%         00:00:00.4 00:00:26 00:05:15
99.99%         00:00:08 00:04:22 00:52:35
99.9%         00:01:26 00:43:49 08:45:56
99%         00:14:23 07:18:17 87:39:29

So, well... imagine a distributed deployment did happened bad, what do you think of the SLA?
And, do you trust people who says they never made any mistakes?

I don't say the days are near where this NP complete aspect of software deployment will bite us.
I say these days exist.
I say the non linear nature of the problem makes it impossible to predict when.
I say the phenomenon will be very abrupt due to nature of the phenomenon.
I say we are pushing towards choices that will create the problem.
I say business analysts, companies, CTO will not see it coming.

And that is my last point:

Our scientific education makes us blind to non linear problems

The first words of your scientific teachers that you have forgotten before teaching you science/math was: «most problems are not linear, but we only study these one because there are the only one for which we can make easily accurate predictions»

If you have a linear system you can predict, plan ... and make money. Without well, you are playing the lottery.

What are non linear stuff ?
- weather (weather forecast after 24 hours is still a scam even though our computers can crush even more data since 40 years);- actuariat/finance: selling products based on the probability connected problem will happen ;
- resource consumption (coal, oil, fish, cows);
- biodiversity;
- cryptography (you search for symmetrical operations with non symmetrical CPU cost)
- floating point behaviour (a operator b != b operator a is not always true)
- economy;
- coupled moving systems in classical physics (randomness can be obtained easily with predictable system if you couple them correctly);
- quantum mechanics (it bounds the max frequency of the CPU);
- the movements of the planets (you can send me your exact solutions for where the moon will be relatively to the sun in one year (length) relatively to a referential made of 3 distant stars).
- internet bandwidth when bought to a tiers one;
- real life, sociology, politics, group dynamics ....

You see a common point there?

We still have not solved these problems, and we do not learn how to solve them in our regular curriculum.

I don't say there is no solutions.
I say there is no solution. Yet. We will never find the solutions if we don't get aware of the problem.
Non linear problems are not a computer problem. They are an intellectual problem that requires proper thinking.
It requires education.

We are pretending to live under the empire of necessity, but there is no necessity to accept this reign.

We try to build a new world with the wrong tools because we are making the false assumption we can handle the problem we face with the methods we learned at school. We rely on the «giant's shoulder» to make the good tools. But, since we are not well educated, we invest money on the wrong tools for our problem. Tools often made by the right guys for often no actual problem.

Firstly, we should slow down our adoption of async/distributed systems;
Secondly, we should lower the SLA to reasonnable levels. If 2% of a service interruption in one of your production can kill you, your business is not reliable;
Lastly, we should understand how the more our systems is efficient the more fragile it is becoming.
It maybe the time to trade efficiency for durability. It maybe the time to slow down and enjoy all the progress we made.