In the past week I had to interview more than a dozen man and woman for my company. Since all interviewer were fresh out of the universities, I thought that some classic CS riddles would give me a good indication of both knowledge and capabilities of the candidates.
Although I was usually good with riddles I sometimes had difficulty with riddles I didn’t immediately know what to do with. Through time I have analyzed questions I know how to solve and questions I was unable to solve (not just riddles). This turned out to be great fun and mental exercise.
I’d like to share with you what goes through my head on one such problem. This problem got a “magic” solution and its basically impossible to figure out of no where, but I want to show you how you can direct yourself towards it and hopefully it won’t look like magic when we’ll get there.
The riddle is quite simple: You have an array containing integers, the length of the array is n which is an odd number. Every number in the array appears exactly twice, except for one number which appear once.
For Example: [3,3,5,5,8] and [9,7,9,7,6,6,5,4,8,4,8] are such arrays. The goal is to find the number which appears exactly once – for the first example the answer is 8 and for the second example the answer is 5.
Take a couple minutes to think of it! 🙂
I LOVE to hear about the naive solution and start with it myself . There are couple of reasons for that:
- Easiest to think of and intuitive.
- What you’ll end up coding (most of the time..).
- A base line from which we can improve.
For this question there are two simple naive solutions (I can think of):
First naive solution
Loop over the array, for each element in the array I reiterate the array from start to finish and see if that number appears again and if the answer is no then we found our number.
This is almost by definition the coding of what we were asked to do. This solution works in time of O(n^2) worst case and O(1) space complexity.
Second naive solution
Loop over the array and build a dictionary (or a map) that maps an element to the number of times that number appeared so far. We then loop over that dictionary and return the number which has a count of 1.
This solution works in time of O(n) worst case and O(n) space complexity. We traded time for additional space.
How to move between solutions
Let’s say we thought of the first solution and we can’t think of the second solution.
In the first solution the time was high, looping over the array seems like a must-do, the other n at the n^2 came from the check if the number appears again, this was a calculation so we can ask ourselves if we can do some prepossessing to be able to compute this information faster. Usually this would requiring remembering information so we could access it faster (O(1) in our case).
With this in mind we have reduced ourselves to the following question: Can we do some prepossessing such that we can answer fast on the question does a number appears twice ? maybe this sounds too close to the original question, how about this generalization : You receive an array and require to answer how many times a number appear (if we can do this we can check if that result equals two or not, so this is indeed a generalization). This makes it easier to think of the dictionary solution, this would be exactly what we want to keep track of.
You can also move from the second solution to the first one by asking yourself similar questions. The tradeoff of time to space is a natural one if you think like that.
What to improve to – and a better solution
We don’t know anything special about this array, the numbers are not bounded and its not sorted. Not a mathematical proof but it seems that less then O(n) time we won’t be able do.
The range of solutions we are thus interested is between n and n^2. Is there something in between ? lot’s of stuff..anything common ? just nlog(n) is really common. What takes nlog(n) ? sorting (that’s good to remember that). Can sorting help ? we can go over some examples and I hope it would become evident that the answer is yes (I leave you the details).
How did I think of sorting ? Was I really clever or used black magic ? No, I wanted something between what I have and a lower bound I believed in and that’s it.
What an optimal probably looks like
So we think that we can’t do less then O(n) in time, but we already did O(n) time with additional space. We also have a solution with O(nlog(n)) time and without additional memory. Since we have an optimal solution time-wise (but not space-wise), and have a near optimal solution time-wise and space-wise (nlon(n) instead of n) I guess that if there is room for improvement then it would be a solution of O(n) time and O(1) space complexity.
An Optimal Solution
We are looking for a solution with O(n) time and O(1) space. This means we pass over the array a few times, but don’t save too much data.
How many times can we pass the array ? A constant number of times. Would it be 17 ? 39 ? probably the answer is once or twice. I don’t know this for sure, but it’s unreasonable that a 17 or 39 would come out of no where (as it seems now). The same goes for the amount of data we would save – it’s probably just a variable or two.
So we want to pass the array once or twice, and save one or two numbers – that’s our lead.
What would those numbers be ? they must be some aggregates, a function of all the data. An idea: How about the sum of all numbers ? for the first example its 3+3+5+5+8..
Let’s examine the idea: Sum doesn’t seem to be the right function, but if only it would count the second element with a minus sign (3-3+5-5+8 = 8) then we would have our way. The problem is that we can’t know if this is the second time we have seen the number (takes as back to the naive solution of either calculating it or saving it with extra space..).
The property we want this function to have is that if it gets the same number twice then both numbers cancel each other out (as in +3-3 = 0).
How does such a function look like ? lets write its inputs and output, how would it look for a single bit number ?
The same number cancel each other out so: f(0,0) = 0 , f(1,1) = 0.
What about f(0,1) or f(1,0) ? we don’t want that different numbers to cancel each other out (e.g we want f(3,3) = 0 but not f(3,5) = 0). So f(0,1)=f(1,0)=1. The
Lets summarize what we want our unknown function to look like:
- f(0,1) =1
I propose f=XOR. Lets examine its properties together (these are thing we can discover along the way by trying an example or two):
- XOR(x,x) = 0 for any x
If the array is sorted, then XOR’ing the elements would give as our number. Why ? Lets try this on the first example, XOR’ing
- XOR(a,a) = XOR(3,3)=0
- XOR(currentResult,a) = XOR(5,5) = 0
- XOR(currentResult,a) = XOR(0,8) = 8
But what if the array isn’t sorted – which is our case ? I claim that again this works, if our f (which is XOR) if associative. In that case we can arrange the elements in any way but the result would stay the same (so “f(sortedArray)=f(arrayInAnyOrder)=missingNumber”). We can check that indeed the XOR method is associative.
Thus an algorithm is to XOR all elements and return the result.
I hope that XOR seems reasonable after all that we have gone through – but take a look back at the original question and XOR looks like a magic solution.
I hope you had fun going over this riddle with me. Understanding a problem and what a solution needs to satisfy doesn’t help just in riddles, its a useful skill in itself, and it’s actually quite fun to do.