A few months ago I took a graduate course on random algorithms.
While I enjoyed the course content, I couldn’t (at first) accept adding randomness to any real project I’m working on. My thinking was that theory is nice but practice is another thing.
Why did I oppose the concept of random algorithms in my code ?
I had two main reasons:
- Readability and Predictability – Code is difficult to understand as it is, it’s even hard to predict the behavior of normal deterministic code, so to add randomness on top of this doesn’t sound like the right thing to do, no matter the case.
- Chance of not working – Almost all the algorithms in the course had some probability of not working. As a developer, the thought of having a code that might not work was totally unacceptable. I test my code and I use defensive coding techniques to make sure that my code does not fail. ever.
But during the course and in the months that followed I started to see random algorithms around me, I realized that when done right they actually increase readability and simplicity and that I can ignore the chance of them not working (don’t panic! I’ll explain why).
In the following post I’ll explain the basics of random algorithms and how they differ then “ordinary” algorithms. I’ll give examples of one “textbook example” and one known real world example. I’ll also touch a bit on testing and the chance of the algorithm of failing (and why we can ignore it) – so keep reading 🙂
What are random algorithms
Taken from xkcd at: https://xkcd.com/221/
A random algorithm is an algorithm is at its heart an algorithm that can toss coins and write code such as
and that’s about it – the algorithm can decide on an operation based on a result of a value which is random. If you’ll put a breakpoint on the first line in the snippet and run the code a couple of times (without changing anything!) then in some times the result would be heads and in others it would return tails – and this value isn’t determined until the execution of the flipCoint method in runtime.
This may doesn’t sound like much, I know. But this means we can get a random bit – a bit that has 50 percent chance of being zero and 50 percent chance of being a one.
If we can get a random bit we can also get a random string of bits of any length – for example we can get 128 random bit by running the above code within a loop.
128 sounds like a small number, and it’s still unconvincing that it’s really helpful – but ask yourself this question: How many different options does a 128 bits string have ?
Let’s answer this question for two bits: for two bits we have the following four options: 00,01,10,11. The first bit has two options, the second bit have two options and we can mix them to get a total of 2 * 2 = 4 options. The same logic applies to 128 bit string to get
2 * 2 * 2 * … *2 (128 times) = 2 to the power of 128 = 340,282,366,920,938,000,000,000,000,000,000,000,000
and that is a lot of different options!
We will see how this enormous amount of options actually helps us, but the key point is that it was very simple to generate this option space – a single loop of 128 iterations – this is powerful.
The Text book example
This example is meant to show you how random algorithms can help you design an algorithm that no one can design an input that the algorithm runs slowly on.
The problem is a simple one: You have an array of numbers with one million numbers. Half of the numbers are one (1), and the other half are zero (so we have half a million elements which are zero and half a million elements which are one).
You need to find an index of an element that it’s value if 0.
Take a second to think about it…yes, a simple loop until you’ll find a zero does the trick 🙂
But now ask yourself – how many iterations did you have to do ? In the worst case the array looks like [1,…,1,0,…,0] (half a million ones followed by half a millions zeros).
In this case you run half a million iterations!
Can we do better ? somewhat surprisingly – NO. No matter how you iterate the elements there is always some array with has all its ones in the first half a million indexes you are going over.
If you’ll get many requests with this input – your algorithm would always run slow, and we all know that if this case exists it would happen in production (developers luck).
So how random algorithms help ?
Let’s see the algorithm then say how it improves:
- Pick a random index //We need 20 bits to represent an index between zero and a million
- Check if the array at that index is zero, and if that’s the case return the index
- Otherwise go back to step one.
How many iterations does this algorithm has to do in the worst case ? Infinity.
That’s doesn’t sound good at all! But wait — with random algorithms we always have to ask what is the chance of something happening. For example – what is the probability that we do more then 128 iterations ? By the same logic as we did above it’s one in 340,282,366,920,938,000,000,000,000,000,000,000,000.
No matter the input – it’s incredibly likely that we won’t do more then 128 iterations, and this is so much less then the half a million result we got with the deterministic algorithm that couldn’t flip coins.
But can’t we get unlucky ? I don’t want to take my chances!
This is what I thought at first. Actually – we are always taking chances.
Would I bet my career on a 1:2^128 chance or a 1:2^256 chance ? Yes, I do this every day.
How low is 1:2^256 ? it’s more likely that you would fill out a lottery ticket and win the grand prize, and then do it again and again. There are less atoms in the entire Earth then 2^256. It’s that small.
Still not believing ? Encryption is based on the fact that you can ‘guess’ 256 bits. If you use a credit card or shop online – your’e taking a 1:2^256 chance with all your savings.
It takes a while – but when it sinks you’ll realize how like this number is really is – you’ll be convinced.
A real world example
Ever heard about a GUID ? a guid is a ‘Globally Unique Identifier’ which basically means a unique Id. How unique ?
Imagine you had to write code that returns different id’s. A naive solution is to start at zero and go up. What’s wrong with this solution ? What if your’e app crushes and starts again ? it’s starts again from zero, which is no longer unique.
So we try to fix this problem, and save the current Id to disk, but what if we run this exact application on two different machines which are not connected to any network ? No matter how you generate the Id – both machines would start at the same Id and it won’t be unique.
This is a problem that can be incredibility complicated to solve even when all those machines are connected to the same network and exchange information, and it can be incredibly simple even in the more general setting when the machines are not connected to any network.
The solution using random algorithms is a one-liner : Pick 256 random bits and let this be the identifier. Again the chances of this going wrong is as good as non existent.
How good is what I said ? Can you rely on it in production ?
Microsoft does. It’s worst – Microsoft uses 128 bits which is a lot more likely to go wrong (and still the chance is almost non existent!) . That’s how reliable random algorithms are!
Hope you’ve enjoyed this post, I’d love to hear your opinions and experiences with random algorithms!