Saturday, May 04, 2013

Palindromes again?

Actually, it's more Haskell than palindromes.

I've been looking over other people's solutions to the "Fair and Square" problem, and they are pretty impressive. As a still-inexperienced Haskeller, I am taking inspiration from then rather than feeling discouraged.

In particular, jagd's solution reminded me that I was not thinking Haskell. Haskell is lazy; values aren't calculated until you require them. That lets you define infinite data structures! You only get into trouble if you do something that requires the whole thing.

For example, [1..] is the infinite list of all positive integers. You can safely use it, as long as you only use a finite amount of it. For example, if you try

uptoSqrt n = [x | x <- [1..], x * x <= n]

and then ask for

uptoSqrt 25

you will get back 1, 2, 3, 4, and 5, and then the function call will hang. You and I  realize that \x -> x * x is monotonically increasing for positive numbers, but Haskell doesn't, and obligingly tries to check the infinitely many integers greater than 5 to see whether any have squares no greater than 25. That takes forever, so it never finishes.

To get the result you want, you will have to write something like

uptoSqrt n = [x | x <- (takeWhile (\y -> y * y <= n) [1..])]

What's takeWhile? Here's a declaration for it:

takeWhile :: (a -> Bool) -> [a] -> [a]

Give it a function that takes an a and returns a Bool, and a list of as, and it will hand you back a prefix of that list--all the items up to, but not including, the first one for which the function returns false.

With that definition,

uptoSqrt 25

gives you the

[1,2,3,4,5]

you expect, because we know that [1..] is in ascending order and the square function is monotonically increasing.

(There's nothing magic about takeWhile; if you ask for

takeWhile even [2, 4..]

it will print even numbers forever, because [2, 4..] is the infinite list of positive even integers, and hence takeWhile will never find an odd number to make it stop.)

That means that I can simply define the whole infinite list of "fair and square" numbers. To make it possible to get those in a certain range, I will have to generate them in ascending order, so back comes the sort that we omitted for a while. I'm going to do something that clashes a bit with a Haskell convention for patterns in function definitions, because it matches the nomenclature of the problem discussion, in which the "fair and square" numbers are referred to as X and the palindromes they are squares of as Y.

possibles nDigits =
    if      nDigits == 1 then [0..3]
    else if nDigits == 2 then [11, 22]
                         else sort (concat [noTwos nDigits, oneTwo nDigits, twoTwos nDigits])


ys = concat [possibles n | n <- [1..]]
xs = [y * y | y <- ys]

palSquares m n = takeWhile (<= n) (dropWhile (< m) xs)

dropWhile is kind of like takeWhile, but it gives you a suffix of the list you hand it, namely everything from the first item for which the function returns false on. If you drop the values less than m from the front of a sorted list, and then grab the values <= n from the front of what's left, you have precisely those values between m and n inclusive. (I'm being a bit free with my language here, of course. Haskell doesn't destroy the original list--at least not if anyone's looking.)

Note that we aren't using floorSqrt or ceilSqrt any more, so they can go away, and we thus aren't counting bits any more.  We're defining the whole list, so we aren't counting decimal digits, either, and digitsIn can go away as well.

So do infinite data structures provide any advantage? Well, consider that I have yet to actually solve the Code Jam problem as such. Remember that it requires reading a file with ranges of values m and n for which we have to print a line of a given format with the value of

length (palSquares m n)

Defining the infinite list of Xs is a form of memoization, remembering the values of a function you calculate so you don't have to do the work if you invoke it again with the same arguments. Suppose two of the input lines are

200 1000
40 500

Evaluating

palSquares 200 1000

will evaluate all the Xs between 0 and 1000 (actually one more past that, because you have to find the stopping place), so that when it comes time to process the next line, all the values between 40 and 500 will have already been determined, so all we have to do is select the values in range.

It's not what you'd want for serious memoization; after all, the input lines could all be within a small interval of the upper bound, so we would arguably have wasted the determination of all those lesser "fair and square" numbers. But it does have the potential of saving time. Let's see what happens with the Code Jam inputs (and with jagd's version of main)...

Cool! Of course, I didn't submit these officially--I'm using a portion of someone else's solution, and peeked at the analysis of the problem, and took far longer to write the program than the rules permit, so it would in no way be fair to take credit--but Google let me download the inputs, and confirmed that the output generated is correct. Here's the time output, first for large input 1, 10K lines with ranges a subset of [1, 1014]:

real    0m0.229s
user    0m0.220s
sys     0m0.008s


and then for large input 2, 1K lines with a range of [1, 10100]:

real    0m1.796s
user    0m1.784s
sys     0m0.012s


Not shabby for time--and to refer back to Mr. Reeder's blog post, I'm sure that with those times, an interpreted version would have no trouble running in the eight minute limit Code Jam imposes--but how about memory? Time to instrument again... and sure enough, keeping that list, or perhaps those lists, around eats RAM. With large input 1, it uses 60+ kilobytes of RAM (the graph generated doesn't show the scale all the way up the Y (RAM) axis), and with large input 2, it uses about five megabytes of RAM. Even the five megabytes is small change these days, and now I suspect I wasn't looking closely at which run I was checking memory usage for last time around. I bet I thought a 10100 run was a 1014 run, and I shouldn't have blamed the sort at all.

UPDATE: I'm truly amazed at the brevity of jagd's solution; you really should check it out. I will have a good time studying it further.  Just for the heck of it, I compiled it with ghc -O2, and tried it on the large input files.  For large input 1,

real    0m0.226s
user    0m0.220s
sys     0m0.008s


For large input 2,

real    0m14.410s
user    0m14.200s
sys     0m0.184s


I wonder what it's doing differently to make it run 63.8 times slower on the larger input rather than 7.84 times slower?

UPDATE:  part of it is the way it's keeping the list of Ys both in numeric and string form, with lots of conversions back and forth. That means more memory; on large input 2, jagd's program uses over 55 Mbytes of RAM, with a very interesting shape to the plot of RAM versus time.

I see that jagd also at least thinks he or she is generating some values that might not be valid Ys, given the

filter (ispalindrome . square)

applied to potential Ys. Maybe that's part of the RAM usage fluctuation.

("." is function composition in Haskell, and filter has a signature that should be familiar to you from takeWhile:

filter :: (a -> Bool) -> [a] -> [a]

filter takes a function and a list and hands  you back all the items in the list for which the function returns true.)

Thursday, May 02, 2013

Fun with palindromes, and the joys of Haskell

An item in Hacker News caught my eye recently. (Ouch!) It linked to a post, "Ruby is Too Slow for Programming Competitions", by a gentleman named Clif Reeder. A summary: Mr. Reeder chose Ruby to try on a Google Code Jam qualification round problem, and found that his solution ran too slowly to meet the Code Jam time constraints (on the official runs, you have eight minutes from downloading the input to upload your output). A version written in the (compiled) Go language ran much faster.

Commenters on the HN item pointed out
  • Mr. Reeder didn't think through the problem fully; there were more possible optimizations. In fact, there's a list of 387 people who used Ruby to successfully handle either the small sample input (which has a four-minute time limit) or the large or really large inputs (with the eight-minute time limit).
  • Like C, the size of int in Go may vary, so the Go version might or might not be able to handle the larger inputs correctly.
There was also a long discussion of whether it's really worth writing programs in a higher-level language and dropping to a lower-overhead language for the time-critical parts (something I thought was SOP these days).

Anyway, I thought I'd add my voice to the chorus of "it's the algorithm, not the language" (or more accurately, if the algorithm isn't efficient, compiled vs. interpreted or high versus low level language doesn't matter) by writing in Haskell. The goal: writing code as clear as I can manage while still solving the problem quickly enough.

(I hasten to add that I am no Haskell expert, and I hope to see someone put Haskell to better, more idiomatic use while maintaining or improving clarity.)

Strictly speaking, I'm writing code that solves the main part of the problem:
Given natural numbers M and N, how many numbers between M and N inclusive are both (base ten) palindromes and squares of numbers which are (base ten) palindromes?
More about the full problem later. For now, let's refer to an example of the kind of number we want as X, where X is a palindrome and also equal to the square of a palindrome Y, as the Code Jam analysis of the problem does.

When you look at the problem, some possible optimizations over the brute force method come to mind:
  1. Rather than look in [M, N] for Xs, look over values in [⌈sqrt(M)⌉, ⌊sqrt(N)⌋] for Ys. Mr. Reeder took advantage of this in his program; it helps a great deal.
  2. Rather than looking at each number in the range for palindromes, better to generate the palindromes in the range. A palindrome is determined by half its digits (unless it has an odd number of digits, in which case there's another digit in the middle that's not constrained to match any other digit), so that gives another reduction on the same order as (1).
  3. If Y squared has to be a palindrome, Y is still further constrained. Suppose Y = 4.....4. Then Y squared has the form n....6, where 16 ≤ n < 25, so it can't be a palindrome. A similar argument rules out five through nine, cutting the work down by two thirds.
That, alas, is as far as I took the analysis. With that much, straightforward Haskell code will spit out all the "fair and square" numbers, as the problem statement calls them, between 1 and 1014 in about a second on my Eee 900A netbook. That range includes all the values in the smaller large input file.

Fortunately, it's possible to do much better than that! Google's full analysis of the problem can be found on the Code Jam site. To summarize it:
  1. Given the constraint on Y's leading and trailing digits, there can't be a carry out of the most significant digit when you add up the partial products of Y * Y to get X, so if Y has n digits, X has 2n - 1 digits, an odd number.
  2. In fact, there's no carry out of any position in the sum of the partial products of Y and Y; that's both a necessary and sufficient condition for X = Y * Y to be a palindrome if Y is a palindrome.
  3. The middle digit of X is thus (since Y is a palindrome) the sum of the squares of the digits of Y, and must be no bigger than 9.
It follows immediately from (3) is that the only Y containing the digit 3 is 3 itself. All other possible Y values can only contain the digits 0, 1, and 2, and can be broken down into three cases:
  • No 2s at all. They can have up to nine 1s if they have an odd number of digits, eight if they have an even number of digits.
  • One 2. The 2 must be the middle digit, requiring an odd number of digits, and the rest of Y can have no more than four ones (if there were five, Y couldn't be a palindrome). Aside from 2, there must be at least two ones, the most and least significant digits.
  • Two 2s. This allows at most one 1, and that only in the middle, which can only happen if there are an odd number of digits.
 Any other digits in Y must be zeroes.

Just constraining the digits to 0, 1, and 2 did speed things up somewhat; the resulting program managed to get up into the 48-digit Xs in about three minutes, but that's nowhere near fast enough to handle the largest test input, which has values up to 10100, in the eight minutes Code Jam limits you to. To do it right, you have to seriously skip values that don't meet the sum of squares constraint.

Here's the code I ended up with. I split out the one- and two-digit cases so that the noTwos, oneTwo, and twoTwos functions don't have to bother with them.

I should point out that
  • I did this without the time constraints of Code Jam, over about eight days after the Hacker News post
  • Of course, you're not seeing the mistakes I made over that eight days!
  • The integer square root function came from an answer to a Stack Overflow question, which in turn was taken from Crandall and Pomerance, "Prime Numbers: a Computational Perspective"
  • The trivial Main is adapted from Chapter 25 of Real World Haskell having to do with profiling (about which more later)
import Prelude
import System.Environment
import Text.Printf


backwards :: Integer -> Integer

backwards n = backwards' n 0
    where backwards' n accum = if n < 10 then n + accum
                                         else backwards' (n `div` 10) (10 * (n `mod` 10 + accum))

numPalindrome :: Integer -> Bool

numPalindrome n = (n == backwards n)


twoTwos :: Integer -> [Integer]

twoTwos nDigits =
    let justTwos = 2 * 10 ^ (nDigits - 1) + 2
    in
        if even nDigits then [justTwos]
                        else [justTwos, justTwos + 10 ^ (nDigits `div` 2)]


choose :: Integer -> Integer-> [Integer]

nDigits `choose` nOnes =
    if      nOnes   == 0     then [0]
    else if nDigits == 1     then [1]
    else if nDigits == nOnes then [sum [10 ^ n | n <- [0..nDigits - 1]]]
    else                          concat [[0 + 10 * x | x <- (nDigits - 1) `choose` nOnes],
                                          [1 + 10 * x | x <- (nDigits - 1) `choose` (nOnes - 1)]]

chooseRange :: Integer -> [Integer] -> [Integer]

chooseRange nDigits onesRange = concat [nDigits `choose` nOnes | nOnes <- onesRange, nOnes <= nDigits]


oneTwo nDigits =
    if even nDigits then []
                    else
                         let msd = 10 ^ (nDigits `div` 2 - 1)
                         in  [((msd + x) * 10 + 2) * 10 ^ (nDigits `div` 2) + backwards (msd + x) |
                                      x <- chooseRange (nDigits `div` 2 - 1) [0..1]]

noTwos :: Integer -> [Integer]

noTwos nDigits =
    let halfN   = nDigits `div` 2
        msd     = 10 ^ (halfN - 1)
        choices = chooseRange (halfN - 1) [0..3]
    in
        if even nDigits then [         (msd + x)           * 10 ^ halfN + backwards (msd + x) | x <- choices]
                        else concat [[((msd + x) * 10 + 1) * 10 ^ halfN + backwards (msd + x),
                                       (msd + x) * 10      * 10 ^ halfN + backwards (msd + x)] | x <- choices]


possibles :: Integer -> [Integer]

possibles nDigits =
    if      nDigits == 1 then [0..3]
    else if nDigits == 2 then [11, 22]
                         else concat [noTwos nDigits, oneTwo nDigits, twoTwos nDigits]


digitsIn :: Integer -> Integer -> Integer

digitsIn base n = digitsIn' n 1
    where digitsIn' n count = if n < base then count else digitsIn' (n `div` base) (count + 1)


candidates :: Integer -> Integer -> [Integer]

candidates m n = [x | x <- concat (map possibles [digitsIn 10 m..digitsIn 10 n]), x >= m, x <= n]

floorSqrt :: Integer -> Integer

floorSqrt n = if n == 0 then 0 else floorSqrt' (2 ^ ((1 + digitsIn 2 n) `div` 2))
    where floorSqrt' x =
               let y = (x + n `div` x) `div` 2
               in if y >= x then x else floorSqrt' y

ceilSqrt :: Integer -> Integer

ceilSqrt n =
    let y = floorSqrt n
    in if y * y == n then y else y + 1


palSquares :: Integer -> Integer -> [Integer]

palSquares m n = [x * x | x <- candidates (ceilSqrt m) (floorSqrt n)]


main = do
    [d] <- map read `fmap` getArgs
    printf "%d\n" (length (palSquares 1 d))


Now, the times mentioned above are for runs without main, under ghci, an interpreter, using a version that sorted the results of possibles to make the output easier to check against the values at "Palindromic Square Numbers". I should also mention that under ghci it was able (again, with the sort) to generate the list of "fair and square" numbers between 1 and 10100 in about 28 seconds on my 2.8 GHz Athlon II X4 630 processor. Some of that time was I/O.

Profiling meant using the compiler, and what a difference that made! The first version of main didn't bother to read the command line arguments, and instead wired in 10100 as the upper bound. Compiling with -O2, it ran fast enough that I made the change to read the command line arguments lest it have optimized away all the calculations. That done, time output for a run, again using 10100 as the upper bound, showed

real  0m0.495s
user  0m0.480s
sys   0m0.012s

With the sort pulled, the time output was

real  0m0.389s
user  0m0.380s
sys   0m0.004s

The sort also turned out to be a major source of memory usage. With sort, it was using nearly four megabytes of RAM. Without, the profile showed it using more like 40K.

Now, you couldn't use the interpreted version in the straightforward way to successfully pass the Code Jam large and larger input tests; you'd want to take advantage of their giving the range of values, generate the list and then convert it to a structure that lends itself to quickly determining how many values are in a given range. (Sorted list, perhaps a search tree...)

The compiled version, on the other hand...  the smaller large input file has no more than 10000 lines, confined to [1, 1014].  Results of time on that range:

real    0m0.003s
user    0m0.004s
sys     0m0.000s


If every input line covered the whole range, that would be thirty seconds, plus the time for I/O--well within the eight minute range.

The larger large input file has no more than 1000 lines, confined to [1, 10100]. If every input line covered the whole ranges, that would be 389 seconds plus I/O time. That's cutting it rather close, but it's still possible.

So, have we shown what we set out to show? Clearly compiled Haskell is far faster than interpreted Haskell. (To give a better comparison, since I ran the interpreted cases on my Eee 900A--the compiled version ran in about 2.5 seconds on the larger range on the Eee.) But that's not the point; the optimized algorithm could trivially run the first large data set even interpreted, and with the precalculation could probably do the larger large data set interpreted.

All that said, the revelation for me was this: it was incredibly easy to write the code, straightforward to test and debug in the interpreter, and I had only to add import lines and the simple main to be able to compile it. Total: sixty-three (nonblank) lines of code. It's helped along by Haskell having arbitrary-precision integers and list comprehensions, which I used liberally.

One could argue that this problem is one particularly well-suited to Haskell, but for me this was a sufficiently enjoyable task that I will do more with Haskell, and I urge you to give it a try too.

UPDATE: Looks like Ruby has Bignum, and various people have posted fairly concise-looking constructs to get the effect of list comprehensions.  It might be fun to rewrite the above in Ruby.

UPDATE: Here are the solutions in Haskell from the Code Jam web site. Check out jadg's code; it's very impressively short. I'll be studying that one.

Wednesday, May 01, 2013

Night of the Living Blog

A good friend nudged me to start putting up stuff I am working/have worked on, so Aging Nerd Notes is shuddering back to activity after what, eight years? I'm amazed Google has kept it around so long.

A flashback and analogy

You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...