Fun fact: I was asked to find a solution to this problem during an interview at ...

psykotic · on Sept 9, 2011

The algorithm for the simplest variant of that problem (take a single random sample from a list of a priori unknown length) can be derived by simple inductive reasoning. If the list has size 1, the problem is trivially solved. Otherwise suppose we have a list (x:xs) and we recursively solve the problem for xs. If all the recursion returns is the random sample from xs, we clearly have insufficient information to take a random sample from (x:xs), so we need to strengthen the induction hypothesis by having the recursion also return the length of xs. Let (r', n') be the random sample and length for xs. Then the result for (x:xs) is (r, n) where n = n' + 1 and r = x with probability 1/n or else r'.

Another way to derive this algorithm is to notice that any solution must implicitly or explicitly compute the length of the list as a byproduct. If you start by writing down the recursive function for computing the length of a list, composed with the straightforward recursive function for taking a random sample from a list of known length, you can apply a standard fusion and deforesting transformation to combine them into a single pass, and you end up with the same algorithm as above.

Here's a fun problem to ponder. Find the most frequently occurring element of a list in O(n) time and O(1) space. You may assume that this element occupies more than half of the list's entries.

DigitalBison · on Sept 9, 2011

>Then the result for (x:xs) is (r, n) where n = n' + 1 and r = x with probability 1/n or else r'.

Do you mean r = x:r' instead of r = x?

psykotic · on Sept 9, 2011

No, r has the same type as x. The logic behind that probability split is simple: We have r' (the random sample taken from the tail list xs) and x, and we have to choose between them with some probability based on n. Clearly x should occur with a 1 in n chance.

When you approach it inductively the way I did, there really isn't any choice in the matter, which is why this algorithm design technique is so powerful. Udi Manber developed the technique in his paper Using Induction to Design Algorithms from 1988 and later used it throughout his great old book Introduction to Algorithms: A Creative Approach from the early 1990s. Here's the paper in case you're curious: http://akira.ruc.dk/~keld/teaching/algoritmedesign_f05/Artik...

ssp · on Sept 10, 2011

This way of thinking also makes it easy to derive the max-sum algorithm from Programming Pearls:

We have to compute max-sum for (x:xs), so first solve it for xs. If the max-sum is within xs, we are done; if not, it somehow involves x. To find out, the recursion has to also return the best sum of the beginning of the list.

A small case analysis can find out what the new max-sum is: x, x + max-begin(xs), or max-sum(xs)? And the new max-begin is either x or x + max-begin (xs).

The naive recursive implementation will require linear space though, so there is an extra step to find out how to eliminate the recursion.

psykotic · on Sept 10, 2011

Yes, Manber derives the linear-time algorithm for the maximal subsum problem exactly like that in his textbook. As you say, if x is involved in a maximal subsum, it must extend the tail's maximal prefix sum, so you return that in addition to the maximal subsum.

Regarding recursion vs iteration, Manber generally develops the right induction hypothesis gradually, using informal language ("remove the element and solve the smaller problem") in the process, and only turns that into a precise algorithm once the final induction hypothesis has been found, so he doesn't present the recursion elimination as a separate step. That works well pedagogically. His main idea is to guide the student's intuitions rather than formally derive a program that's correct by construction the way someone like Richard Bird might have done it. (Bird has an interesting derivation of the maximal subsum algorithm that begins with the brute-force algorithm written as a combinator-based functional program and gradually transforms it in a correctness-preserving way using program calculation techniques until arriving at the fast linear-time program.)

DigitalBison · on Sept 10, 2011

Whoops, I misinterpreted what you meant by "single random sample". I will definitely check out that paper though.