## Saturday, 23 June 2012

### In praise of Knuth - the joy of algorithms

Since I am taking part in a Turing-related Lates evening at the Science Museum on Wednesday 27 June, in which my contribution will look at a curious mathematical algorithm and its applications in computing, I have been thinking about some of the algorithms I have used in programming, and this post will describe one particular favourite (not the algorithm I will be demonstrating on Wednesday).

I used to be a keen bridge player.  Sometimes in competitions hands were "computer-dealt" and people noticed that the card distributions seemed to be less regular than in hands shuffled and dealt physically.  Good players even adjusted their play according to whether the hand was dealt by a human or a computer.  The moral is that human shuffling isn't perfectly random.

Anyway, I wanted to programme the shuffle of a pack of cards so that I could use computers to explore strategies for playing various forms of patience ("solitaire" to Americans).  In particular a student project verified that the probability of clock patience coming out is indeed 1/13, as I had proved as a schoolboy.  How do you get a computer to shuffle a pack of cards (or equivalently to generate a random permutation of the numbers 1 to 52, or more generally 1 to n)?  (In this post I am using "random" to mean that all possibilities are equally likely, which is what one hopes for in shuffling cards.)

I thought the obvious approach was to generate successive random (equally likely) integers between 1 and n, simply rejecting those that had already been drawn.  As we go through the shuffle, we are more and more likely to generate random numbers corresponding to cards already dealt, so towards the end it can take a long time to find an undealt card - when there is only one number left to be drawn, the probability of hitting it is only 1/n and the expected number of draws required to get it is n.  In practice this algorithm seemed to work OK for a pack of 52 cards - the probability that it takes an absurdly long time to get the last few cards is in fact quite small, and some clumsy ad hoc adjustments could improve it.  The result was a messy piece of code which might take some time to complete the shuffle for a large pack.

Then a colleague (thanks, Don) told me about Knuth's shuffling algorithm (which it turns out is a variation of one devised by Fisher and Yates before the computer era).  It's beautiful.  To shuffle n cards, you simply do the following: for i between 1 and n, we successively swap card i with a random card in the pack.  It's easy to see that the result is a random permutation of the pack and that all possible permutations are equally likely.  We have a simple algorithm, very easy to programme, that generates a random shuffle in a fixed number of steps.

One of the joys of mathematics can be finding that there is a beautiful, simple solution to a problem on which one has spent a lot of time working with messy, unsatisfactory methods.  This was such an example.  I remember how happy I felt when I learned about this algorithm!

Of course, the downside is that I am reminded that people like Knuth always seem to produce beautiful, simple algorithms in contrast to my clumsy efforts.  It's a bit like watching Messi play football - how many can aspire to play like that? - except that I am left with the frustrating feeling that I should have been able to come up with the algorithm myself.  Luckily perhaps I have now reached an age where I can appreciate beautiful mathematics without beating myself up over my failure to do it for myself.

1. When I started my current job, my first project was to help the company who made our e-assessment system to port it to javascript. On looking through the small amount of code they'd already written, I came across a procedure for shuffling a list of questions.

You might think of the first algorithm you gave above as the "naive" algorithm - the one you write down before thinking about the problem. The algorithm I found was way worse than that.

It went like this:
1) Pick a number in the range 0 to 100000. Because 100000 is the Biggest Number Imaginable.
2) If the number you picked is bigger than the length of your list of questions (extremely likely), pick another number.
3) Eventually you get the index of a question in your list. Now repeat steps 1 and 2 until you get another number in the right range which is different from the first one.
4) Swap those two questions.
5) Repeat the above 20 times. Because 20 is the Smallest Reasonable Number.

This fails in just about every way possible. Not only is it not guaranteed to finish in finite time, it doesn't come close to producing a uniformly random shuffle and has a completely arbitrary limit on the size of the lists it can shuffle.

Needless to say, I immediately replaced it with Knuth's algorithm. Shortly after that we decided it was best to start from scratch on our own system.

Though I do now tell PhD students learning to code to be aware that 100000 is the biggest number they can use in an algorithm.

2. Love it! Many years ago I worked on mathematical modelling software which used the value 1.23456D65 as an indicator that a particular variable was "held". I always wanted to create a model in which some variable would genuinely take this value, thus causing the software to go haywire.

3. Peter Norvig tells an interesting story about how when he has at high school, his teacher presented an erroneous shuffling algorithm. Norvig noticed it was wrong and independently came up with Knuth's algorithm P

4. Sounds like Peter Norvig rates himself

5. To shuffle n cards, you simply do the following: for i between 1 and n, we successively swap card i with a random card in the pack.

Actually we swap card i with a random card from i to n.