Sunday, June 02, 2013

After the smoke clears...

OK. I think I've taken this problem as far as I can go... though someday I will set it up to read and write named files so I can run it in ghci. After all, this all started because I wanted to demonstrate that it's the algorithm, not the language or whether it's interpreted or compiled after reading Clif Reeder's "Ruby is Too Slow for Programming Competitions". On the other hand, I now have a Haskell program that's within spitting distance of running twice as fast as a C++ solution for the problem, taking a bit over 0.2 seconds to run the hairier official input file. Eight minutes, the time Code Jam gives you to run and send back results when it counts, is 480 seconds; I tend to doubt that interpreter overhead would come near a factor of 2,400, in which case the point is made.

This started with the less-than-noble motivation of "Well, I'll show him...", but quickly changed, for a couple of reasons.
  • The unalloyed pleasure of problem solving always wins.
  • Haskell is a sheer delight to use.
It's been quite a while since I've had this much fun programming. Haskell made it easy to clearly and simply say what I meant, and when it came time to speed it up, the tools were there, and even the optimized versions were still intelligible. Not to mention that while there were bugs, there weren't many and they were easy to find. I could pull pieces into ghci and test if need be, and need wasn't all that often. All this (aside from Twan van Laarhoven's semilattice search trees) done by a Haskell newbie and duffer pushing 60 (with 40+ years of indoctrination in imperative programming aside from a bit of Lisp here and there), in spare time on evenings and (mostly) weekends.

Here's the result. You'll need the semilattice code, which is available at the link above. I compiled with -O2 in the interests of speed.

{-
  OK, here's what I hope is the ultimate "Fair and Square" program.
  (See https://code.google.com/codejam/contest/2270488/dashboard#s=p2;
  the core of the problem is, given two positive integers, M and N, determine
  the number of base 10 palindromes between M and N inclusive that are squares
  of base 10 palindromes.)

  Best time output, reading the L2 input file, on a 2.8 GHz AMD Athlon II
  X4 630:

  real    0m0.227s
  user    0m0.208s
  sys     0m0.016s

  [Ultimate? Silly me; there's always something else to try. However, I think
  I'm done with it, for now at least.]

  The main insights to speed up the process follow.
  (4)-(6) are directly from the Google Code Jam analysis of the problem at
  https://code.google.com/codejam/contest/2270488/dashboard#s=a&a=2
  (9) is from Twan van Laarhoven's "Search trees without sorting", on his blog
  at http://twanvl.nl/blog/haskell/SemilatticeSearchTree,

  1. Look for palindromes in [ceil(sqrt(M))..floor(sqrt(N))] whose squares are
     palindromes. (We'll refer to one of the former set as X, and one of the
     latter set as Y, as is done in Google's analysis.)
  2. Don't look for palindromes, generate them. They're determined (aside from
     the middle digit if there are an odd number) completely by half their
     digits, so in a range of 10^(2*n) integers, there are on the order of
     10^n palindromes, a reduction on the order of (1).
  3. If Y^2 is to be a palindome, its most/least significant digit can't be
     greater than 3.
  4. In particular, the Ys are precisely those that, when you square them via
     long multiplication, have no carries in the addition of partial products.
  5. (3) implies that the Xs have an odd number of digits.
  6. (3) also implies that the middle digit of X, which is the sum of the
     squares of the digits of the palindrome being squared, must be no larger
     than 9. Therefore, *none* of the digits of Y can be greater than 3, and
     3 only appears in Y = 3.
  7. (6) implies that Ys come in three flavors: those with two 2s, those with
     one 2, and those with no 2s.
  8. Save for very small n, it's easier to count d-digit Ys than to generate
     them--and remember, the problem only asks for the count.
  9. By treating a binary tree as a sublattice, labeling branches with the
     upper bounds of the corresponding subtrees, one can quickly search for
     the first value in the tree >= a specified value.
 10. By labelling the sorted Ys for a given number of digits with their ordinal
     positions, one can use those tree searches to quickly determine how many
     Ys are in a given interval.

  NOTE: 0 is a one digit palindrome, and its square is, too, but since we are
  only dealing with positive integers, we'll ignore zero.
-}

import Prelude
import Data.List
import SemilatticeSearchTree
import qualified Data.ByteString.Char8 as B

-- It's easy to calculate the number of n-digit Ys.

numNDigitYs :: Int -> Int

numNDigitYs n = if n == 1 then 3 else numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos d = if even d then 1 else 2
          numOneTwos d = if even d then 0 else d `div` 2
          numNoTwos  d = let d' = d `div` 2 - 1
                             n' = sum [d' `choose` i | i <- [0..min d' 3]]
                         in if even d then n' else 2 * n'
                                   
choose :: Int -> Int -> Int

m `choose` 0 = 1
m `choose` n = product [m - n + 1..m] `div` product [1..n]                            

-- We may wish to memoize numNDigitYs...

ysInDigitRange d1 d2 = sum [numNDigitYs d | d <- [d1..d2]]

{-
  Now the slow part: generating and counting Ys in a range other than "all
  d-digit numbers".
-}

powersOfTen = [10 ^ n | n <- [0..]] -- or, if you wish, map (10 ^) [0..]

tenToThe :: Int -> Integer

tenToThe n = powersOfTen !! n

twoTwos :: Int -> [Integer]

twoTwos n
    | n == 1    = []
    | even n    = [twoTwosNoOnes]
    | otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe (n `div` 2)]               
    where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2

oneTwos :: Int -> [Integer]

oneTwos n
    | n == 1     = [2]
    | even n     = []
    | otherwise  = [p + 2 * tenToThe halfN
                      | p <- justOnes n (min 1 (halfN - 1))]
    where halfN = n `div` 2

noTwos :: Int -> [Integer]

noTwos n
    | n == 1     = [1,3]
    | even n     = base
    | otherwise  = concat [[p, p + tenToThe halfN] | p <- base]
    where halfN = n `div` 2
          base = justOnes n (min 3 (halfN - 1))

{-
  justOnes -- generate the palindromes with a specified number of digits all of
  whose non-zero digits are 1s. 1 must be the most/least significant digit; up
  to a specified number of 1s are scattered through the rest of each half.
  If the number of digits requested is odd, we leave a 0 in the middle digit,
  for the caller to either replace with another digit or not.
-}

justOnes :: Int -> Int -> [Integer]

justOnes n o =
    let halfN = n `div` 2
        shell = tenToThe (n - 1) + 1
        innards = concat [(halfN - 1) `choices` k | k <- [0..o]]
        -- originally pair i = tenToThe i + tenToThe (n - (i + 1))
        ascending  = take (halfN - 1) (drop 1 powersOfTen)
        descending = take (halfN - 1) (reverse (take (n - 1) powersOfTen))
        memoPair = zipWith (+) ascending descending
        pair i = memoPair !! (i - 1)
    in [foldl' (+) shell (map pair c) | c <- innards]

choices :: Int -> Int-> [[Int]]

m `choices` n
    | n == 0           = [[]]
    | m == n           = [[m, m-1..1]]
    | otherwise        = [m : c | c <- (m - 1) `choices` (n - 1)]
                         ++ ((m - 1) `choices` n)

nDigitYs :: Int -> [Integer]

nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)

-- We turn the sorted lists from nDigitYs into search trees.
-- Here we use van Laarhoven's semilattice search tree routines/types.
-- Sigh... we're doing a bit of cargo cult programming here, because even
-- though we'll never search on the ordinal index, we'll follow the blog
-- post example.

yTree :: Int -> SearchTree (Max Integer, Max Int)

yTree n = fromList [(Max x, Max y) | (x, y) <- zip (nDigitYs n) [0..]]

yTrees = map yTree [1..]

dDigitYTree :: Int -> SearchTree (Max Integer, Max Int)

dDigitYTree d = yTrees !! (d - 1)

-- ...leading up to this function.

ysInRange :: Integer -> Integer -> Int -> Int

ysInRange m n d
      | nVal == n = nPos - mPos + 1
      | otherwise = nPos - mPos
      where (mVal, mPos) = findFirst' m
            (nVal, nPos) = findFirst' n
            findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
                               Just (Max i, Max j) -> (i, j)
                               Nothing             -> (tenToThe d, numNDigitYs d)

-- counting decimal digits and bits
-- the numbers here are big enough to nudge us past the obvious

powersOfTwo = map (2^) [0..]

bigTwos = map (2^) powersOfTwo
bigTens = map (10^) powersOfTwo

bitsIn n   = bDigits n bigTwos
digitsIn n = bDigits n bigTens

bDigits :: Integer -> [Integer] -> Int

bDigits n xs = bDigits' n 1
    where bDigits' n s = let upToN = takeWhile (<= n) xs
                         in case upToN of
                            [] -> s
                            _  -> bDigits' (n `div` (last upToN))
                                           (s + powersOfTwo !! (length upToN - 1))

-- The following is derived from a response to a Stack Overflow question
-- http://stackoverflow.com/questions/1623375/writing-your-own-square-root-function
-- which in turn is taken from Crandall & Pomerance, "Prime Numbers: A
-- Computational Perspective".

floorSqrt :: Integer -> Integer

floorSqrt 0 = 0
floorSqrt n = floorSqrt' (2 ^ ((1 + bitsIn n) `div` 2))
    where floorSqrt' x =
               let y = (x + n `div` x) `div` 2
               in if y >= x then x else floorSqrt' y

-- then the obvious...

ceilSqrt :: Integer -> Integer

ceilSqrt n =
    let y = floorSqrt n
    in if y * y == n then y else y + 1

-- Finally, the top level, which counts where it can, and generates, filters,
-- and counts where it must. Note that we break the interval down into ranges
-- that are subsets of d-digit numbers for appropriate values of d.

numYs :: Integer -> Integer -> Int

numYs m n
    | mDigits == nDigits   = ysInRange m n mDigits
    | mPower10 && nPower10 = ysInDigitRange mDigits (nDigits - 1)
    | mPower10             = ysInDigitRange mDigits (nDigits - 1)
                             + ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
    | otherwise            = ysInRange m (tenToThe mDigits - 1) mDigits
                             + ysInDigitRange (mDigits + 1) (nDigits - 1)
                             + ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
    where mDigits    = digitsIn m
          nDigits    = digitsIn n
          mPower10 = m == tenToThe (mDigits - 1)
          nPower10 = n == tenToThe (nDigits - 1)

numXs :: Integer -> Integer -> Int

numXs m n = numYs (ceilSqrt m) (floorSqrt n)

-- A new main, heavily inspired by "Haskell I/O for Imperative Programmers"
-- http://www.haskell.org/haskellwiki/Haskell_IO_for_Imperative_Programmers
-- (What's with that tail, you ask? We're skipping that first line that says
-- how many more lines there are. They could have been persnickety and padded
-- the file with additional lines to make sure you have to pay attention to
-- that first line, but they weren't.

main = do
    s <- B.getContents
    let r = map format $ zip [1..] (map process (tail $ B.lines s))
    putStr (unlines r)

process :: B.ByteString -> Int

process line = case map B.readInteger (B.words line) of
               [Just (m, _), Just (n, _)] -> numXs m n
               _                          -> -1

format :: (Int, Int) -> String

format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result

No comments:

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...