Sunday, October 20, 2013

A new problem

OK, let's try a new problem this time: Project Euler #92. I happened across a YouTube video of a "live coding" session by Daniel Silverstone. It's worth your while. I admire the courage of someone willing to show such an unedited session, typos and all, and besides, the video shows just how expressive Haskell is.

So, having seen Mr. Silverstone's video, this is another case in which I can't pretend to having had all the insights myself. I will write the code, though; it's not copy and paste.

The problem has to do with a function that maps n to the sum of the squares of the digits of n expressed in base ten. I can't quite bring myself to use show, so...

sumSqDigits :: Int -> Int
sumSqDigits n = sumSqDigits' n 0
    where sumSqDigits' n !s
           | n < 10    = n * n + s
           | otherwise = let (q, r) = divMod n 10 in sumSqDigits' q (r * r + s)

Now, it can be shown that for non-zero n, iterate sumSqDigits n will "end" with one of
  • repeat 1
  • concat $ repeat [89, 145, 42, 20, 4, 16, 37, 58]
So, Project Euler problem #92 is the musical question "how many of the positive integers less than ten million are of the second kind?"

We'll borrow Mr. Silverstone's function name...

terminator :: Int -> Int
terminator n
    | n == 1 || n == 89 = n
    | otherwise         = terminator $ sumSqDigits n

Well, then just type

length . filter (== 89) $ map terminator [1...9999999]

and you're done, right? Well, you are if you're willing to wait.  It is, after all, evaluating sumSqDigits rather a lot. Mr. Silverstone uses the State monad to memoize terminator and speed things up quite a bit.

How best to do this? There are only two values, so one could argue that we're wasting space saving terminator values; we can recast the function as

terminatesWith89 :: Int -> Bool
terminatesWith89 n = n == 89 || terminatesWith89 (sumSqDigits n)

which in turn suggests a bitmap, since we have a bound on the values we're looking at. Ten million bits isn't all that much--at least not these days. But we can do better than that. After all, sumSqDigits takes an n-digit number to at most 81 * n. We're just interested in numbers of up to seven digits, so we need only memoize up to 7 * 81 = 567. Seventy-one bytes beats 1.2 megabytes any day.

For that matter, we can decide on how much space we're willing to devote to it, and just apply sumSqDigits until it's in range for our bitmap, which won't take too long. Say we start with n = a googol - 1. One hundred digits, so sumSqDigits n is at most 8100. That's four digits, so sumSqDgits of that is at most 243, and we're good to go.

With that it didn't take long to write up a 31-line (including "declarations" and blank lines) Haskell program that, when compiled with ghci -O2, solved the problem in a little over three seconds. We wrote a little helper function that followed the problem terminology:

chain :: Int -> [Int]
chain n = iterate sumSqDigits n

It proved nice to use along with dropWhile... but could we make it better? Well, we might have been able to speed up the generation of our bitmap, because everything on the same chain has the same terminator value, so we could have filled in a whole chain at once.

(Come to think of it, a chain that ends with 1, 1, 1, ... has to get there by way of a power of ten. Other values can get there, e.g. 86 (64 + 36 = 100);  is there some way they can be characterized? Maybe there's a closed form solution.)

A flashback and analogy

You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...