Saturday, November 09, 2013

Project Euler #78

If you have a pile of n coins, there's some number of distinct ways to divide them up into piles; let's refer to the function that maps a number of coins to the number of ways to divide them up into piles as p. The example they give is for five coins; as you can see, p 5 == 7.

OOOOO
OOOO O
OOO OO
OOO O O
OO OO O
OO O O O
O O O O O

Things to note:
  1. the coins are all alike; it's not like you have different denominations that could distinguish one pile from another with the same number of coins.
  2. order doesn't matter; OOO OO and OO OOO don't count as different.
One's first inclination is to say "If you create a pile of m coins, that leaves n - m, and there are p (n - m) ways to divide them up, right?" Not exactly; if you don't confine yourself to piles of m or fewer coins, you'll violate (2). So we need what we'll call p', where p' n m is the number of ways to divide up n coins into piles of size m or less. Then you have

p :: Int -> Int
p n = sum $ map (p' n) [1..n]


That leaves p'. There are some obvious base cases:

p :: Int -> Int -> Int
p' n m
    | m >= n - 1 = 1
    | m == 1     = 1

If not those, then it turns out to be

    | otherwise = sum [p' (n - m) i | i <- [1..m']]
                      where m' = min (n - m) m


That matches up with examples I did by hand, and in fact corrected some I managed to get wrong, so I am fairly sure it's correct.

Unfortunately, the full Euler Project problem asks for

head [n | n <- [1..], (p n) `mod` 1000000 == 0]

and this way of calculating p' and thence p is up there with the canonical recursive Fibonacci series function for slow. We need a faster way to evaluate  p', or even better a fast way to calculate p without having to bother with p'.

So we generated some lines of output, the ith line being map (p' i) [1..i],  Here it is, left justified:

[1]
[1,1]
[1,1,1]
[1,2,1,1]
[1,2,2,1,1]
[1,3,3,2,1,1]
[1,3,4,3,2,1,1]
[1,4,5,5,3,2,1,1]
[1,4,7,6,5,3,2,1,1]
[1,5,8,9,7,5,3,2,1,1]
[1,5,10,11,10,7,5,3,2,1,1]
[1,6,12,15,13,11,7,5,3,2,1,1]
[1,6,14,18,18,14,11,7,5,3,2,1,1]
[1,7,16,23,23,20,15,11,7,5,3,2,1,1]
[1,7,19,27,30,26,21,15,11,7,5,3,2,1,1]
[1,8,21,34,37,35,28,22,15,11,7,5,3,2,1,1]
[1,8,24,39,47,44,38,29,22,15,11,7,5,3,2,1,1]
[1,9,27,47,57,58,49,40,30,22,15,11,7,5,3,2,1,1]
[1,9,30,54,70,71,65,52,41,30,22,15,11,7,5,3,2,1,1]
[1,10,33,64,84,90,82,70,54,42,30,22,15,11,7,5,3,2,1,1]
[1,10,37,72,101,110,105,89,73,55,42,30,22,15,11,7,5,3,2,1,1]
[1,11,40,84,119,136,131,116,94,75,56,42,30,22,15,11,7,5,3,2,1,1]


And here it is right justified:

[1]
[1,1]
[1,1,1]
[1,2,1,1]
[1,2,2,1,1]
[1,3,3,2,1,1]
[1,3,4,3,2,1,1]
[1,4,5,5,3,2,1,1]
[1,4,7,6,5,3,2,1,1]
[1,5,8,9,7,5,3,2,1,1]
[1,5,10,11,10,7,5,3,2,1,1]
[1,6,12,15,13,11,7,5,3,2,1,1]
[1,6,14,18,18,14,11,7,5,3,2,1,1]
[1,7,16,23,23,20,15,11,7,5,3,2,1,1]
[1,7,19,27,30,26,21,15,11,7,5,3,2,1,1]
[1,8,21,34,37,35,28,22,15,11,7,5,3,2,1,1]
[1,8,24,39,47,44,38,29,22,15,11,7,5,3,2,1,1]
[1,9,27,47,57,58,49,40,30,22,15,11,7,5,3,2,1,1]
[1,9,30,54,70,71,65,52,41,30,22,15,11,7,5,3,2,1,1]
[1,10,33,64,84,90,82,70,54,42,30,22,15,11,7,5,3,2,1,1]
[1,10,37,72,101,110,105,89,73,55,42,30,22,15,11,7,5,3,2,1,1]
[1,11,40,84,119,136,131,116,94,75,56,42,30,22,15,11,7,5,3,2,1,1]

We see some patterns.
  • p' n 2 == n `div` 2 for n > 1. There are n `div` 2 possible piles of size two, and each is either there or split into two piles of size one, so that makes sense.
  • Check out that trailing portion. There's a steadily growing hunk of the end that never changes. It's because the maximum possible pile of what's left is no longer constrained, so past a certain value of n, p' n (n - m) = p m.
Let's check that last one by looking at the initial hunk of map p [1..]:

[1,2,3,5,7,11,15,22,30,42,56,77,101,135,176,231,297,385,490,627]

So, it would seem that p n == p' (2 * n) (2 * n - n) == p' (2 * n) n. That gets rid of the sum at the p end, but adds to the work on the p' side; it's not any faster, and maybe even a little slower. Worse yet, this isn't like factorial, where the number of trailing zeroes never shrinks--we're liable to end up using Integer instead of Int for the values the problem wants, which will make things slower still.

What we're really doing is counting "integer partitions". The integer being partitioned is the total number of coins, and each pile is represented by the number of coins in it (all the coins are alike). I will have to study this further.

UPDATE: maybe we can generalize the argument we did for p' n 2. What can we say about p' n k? There are n `div` k possible piles of size k, all right, but the leftovers have more than one possible state, and it's not as easy to enumerate them so that the pile sizes stay monotonically non-increasing.

Monday, November 04, 2013

Data Structures

The conventional wisdom on data structures in functional languages is that immutability costs you. It was a big advance when Okasaki came up with algorithms that have an amortized complexity for lazy functional languages the same as the non-amortized complexity for the data structures with destructive update.

Work is going on in that area; check out Edward Kmett's post "Part I: Deamortized ST" about how to come up with a way to make the amortization unnecessary. (It's a bit reminiscent of Asimov's essay "Behind the Teacher's Back".) I'm hoping that will ultimately mean Haskell will be usable in still more situations.

(Actually, I should just say "read Edward Kmett's blog" period. It's all very good stuff.)

A flashback and analogy

You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...