Aging Nerd Notes: 05/12/2013

Saturday, May 18, 2013

A little finesse...

It took a little bit to get this right:

choices :: Int -> Int-> [[Int]]

m `choices` n
    | n == 0           = [[]]
    | m == n           = [[m,m-1..1]]
    | otherwise        = [m : c | c <- (m - 1) `choices` (n - 1)]
                         ++ ((m - 1) `choices` n)

The gotcha is the n == 0 case; if you have it return [] the list comprehension gives the wrong result (i.e. an empty list!) with n == 1. (I didn't really need to make the m == n case go in descending order, but it looked nicer.)

Shouldn't take long to recast the palindrome program now; the idea, as alluded to earlier, is to represent half-Ys (aside from the easy-to-generate twoTwos) not by an Integer but by a list of the positions where we place 1s. Then backwards doesn't have to take the half-palindrome apart decimal digit by decimal digit, but can just reflect the positions and directly add up the corresponding powers of ten... but OTOH, do I really want to churn out a bunch of lists? For 50-digit Ys, that's up to 25 choose 3, or 2,300. If I go to representing them as a bit mask, I'm just chugging through bits rather than decimal digits--which would be faster, so perhaps that's the thing to do... or we generate both halves at the same time, while we have the information.

Friday, May 17, 2013

Going backwards

I figured I might try the same "loop unrolling" with backwards' as worked so well with digitsIn'. Admittedly I only did it once, but the results didn't go the right way:

COST CENTRE                MODULE %time %alloc

numWithin                  Main     29.5    7.2
solve                      Main     21.0   35.3
backwards.backwards'       Main     16.6   17.5
backwards.backwards'.(...) Main      9.5   12.6
choices                    Main      5.2    8.9
digitsIn.digitsIn'         Main      5.2    4.9
nDigitYs                   Main      4.6    5.8
tenToThe                   Main      1.8    0.0
floorSqrt.floorSqrt'.y     Main      1.1    1.2
oddDigitsPals              Main      0.9    1.8
noTwos                     Main      0.7    1.3

As you can see by comparison with a previous profile, backwards is now taking up more time and allocation than before.

Guess we'll have to memoize...

backwardsMemo = [backwards' n 0 | n <- [0..]]
    where backwards' n accum
              | n < 10    = accum + n
              | otherwise = let (d, m) = n `divMod` 10
                            in backwards' d (10 * (m + accum))

backwards :: Integer -> Integer

backwards n = backwardsMemo `genericIndex` n

Um, no, we won't, not that way. In theory, at least, we should just be evaluating backwards a bit over 40,000 times, but then there are the calls to backwards'. After about two minutes I killed the process. (Does indexing cause all the intermediate items to be generated?)

We know that the tempting

backwards = read . reverse . show

loses big time in comparison with what we have already as well.

So, let's think about the particular values we're handing backwards:

For twoTwos, 2 * tenToThe n. Here, backwards will grind down a zero at a time and give us back 2, every time.
For oneTwo and noTwos, it will always be a sum of between 1 and 2 (for oneTwo) or 1 and 3 (for noTwos) values of the form tenToThe i.

So... we blew it when we wrote our even and odd number of digit palindrome generators the way we did, and we don't want to generate them the way we are now. We don't want to represent our half-Ys as Integers, but instead as short lists of the powers of ten that you add to make them up. No Integers until we actually churn out the palindromes themselves. Then we can avoid a numeric backwards altogether.

"How low can you go?"

(Hey, with that title I can just keep my mouth shut and young people will think I'm quoting Ludacris, while old people will think I'm quoting Chubby Checker! Oops...)

So I decided to create a dummy version of the program that instead of calculating the number of "fair and square" numbers in a given range, just adds the low and high end of the range and prints it out, using the format the Code Jam problem requires:

numXs :: Integer -> Integer -> Integer

numXs a b = a + b

-- jagd's main, which I believe is set up to read stdin

solve i num = do
    [a, b] <- getLine >>= return . map read . words
    putStrLn $ "Case #" ++ show i ++ ": " ++ (show $ numXs a b)
    if i < num
       then solve (i+1) num
       else return ()

main = do
   num <- getLine >>= return.read
   solve 1 num

Compiled it up with -O2 and ran...

time ./dummy <C-large-practice-2.in >woof.dummy

real    0m0.106s
user    0m0.092s
sys     0m0.012s

Here the variance is a hefty factor; one run took 0.171 seconds!

The surprising part of the profiling was the memory allocation:

total alloc = 196,851,848 bytes (excludes profiling overheads)

Compare that with the real program:

total alloc = 546,422,264 bytes (excludes profiling overheads)

The full program is allocating lists of powers of ten, and potentially over ~~50,000~~40,000 Y values! There's something very inefficient going on in solve.

UPDATE: There must be a difference between "total alloc" and memory usage. Running dummy with the options to track memory usage shows memory usage peaking at maybe 60K, compared with the palindrome program, which peaks quickly at around 4M. Both programs must be allocating and freeing quite a lot, the difference being that we save up those lists of Ys and powers of ten.

Thursday, May 16, 2013

Loop unrolling, Haskell style

We'd written digitsIn' to allow tail call optimization, so that the recursive call gets compiled as a loop:

digitsIn base n = digitsIn' n 1
    where digitsIn' n count
            | n < base = count
            | otherwise = digitsIn' (n `div` base) (count + 1)

This does remind me of a blog post that Hacker News pointed at, in which another operation was being done in Haskell a digit at a time. The solution there, as here, is something kind of like loop unrolling:

digitsIn base n = digitsIn' n 1
    where base2 = base * base
          base4 = base2 * base2
          digitsIn' n count
            | n < base = count
            | n > base4 = digitsIn' (n `div` base4) (count + 4)
            | n > base2 = digitsIn' (n `div` base2) (count + 2)
            | otherwise = digitsIn' (n `div` base) (count + 1)

And the results:

real    0m0.679s
user    0m0.656s
sys     0m0.020s

From the profiler:

COST CENTRE                MODULE %time %alloc

numWithin                  Main     31.6    7.0
solve                      Main     21.0   34.6
backwards.backwards'       Main     14.7   17.1
backwards.backwards'.(...) Main      9.0   12.9
choices                    Main      5.5   10.8
nDigitYs                   Main      4.8    5.7
digitsIn.digitsIn'         Main      4.5    4.8
oddDigitsPals              Main      1.3    1.8
noTwos                     Main      1.1    1.3
tenToThe                   Main      1.1    0.0
digitsIn                   Main      1.1    0.0
floorSqrt.floorSqrt'.y     Main      0.7    1.2

digitsIn' is down from almost 15% of the time to 4.5%, and from a sixth of the allocation to under a twentieth. We've gotten closer to the runtime of the C++ solution I grabbed and compiled, going from taking not quite twice as long to not quite 1.7 times as long. There ought to be a way to similarly speed up backwards'. (And what's up with solve?)

I'm really procrastinating on that data structure change, huh?

Wednesday, May 15, 2013

Guess we'll have to take the plunge

Just to check, I ran a version of the program with a revision of ysInRange, the function that actually does the range checking to do some of the alleged improvements mentioned earlier:

numWithin :: Integer -> Integer -> [Integer] -> Int

numWithin m n xs = length (takeWhile (<= n) $ dropWhile (< m) xs)

simpleYsInRange :: Integer -> Integer -> Int -> Int

simpleYsInRange m n digits = numWithin m n (ysByDigits !! (digits - 1))

-- Now for the real thing. As with the palindrome generation, the caller has
-- the number of digits of interest handy, so we take it as a parameter

ysInRange :: Integer -> Integer -> Int -> Int

ysInRange m n digits
    | digits < 8 = simpleYsInRange m n digits
    | mMSDs > 20 = 0
    | mMSDs > 11 = numWithin m n (twoTwos digits)
    | otherwise = simpleYsInRange m n digits
    where mMSDs = m `div` tenToThe (digits - 2)

The results? Inconclusive; there's no clear difference. It's going to take a better data structure.

UPDATE: here's the "cost center" listing from profiling output.

COST CENTRE                MODULE %time %alloc

numWithin                  Main     28.2    6.1
solve                      Main     18.1   30.3
digitsIn.digitsIn'         Main     14.8   16.6
backwards.backwards'       Main     12.1   15.0
backwards.backwards'.(...) Main      9.8   11.4
nDigitYs                   Main      4.4    5.0
choices                    Main      4.3    9.4
tenToThe                   Main      1.5    0.0
floorSqrt.floorSqrt'.y     Main      1.3    1.0
digitsIn                   Main      1.3    0.0
noTwos                     Main      1.0    1.1
oddDigitsPals              Main      0.6    1.6

So it is indeed walking the lists of generated palindromes that's the time sink. (I do wonder about digitsIn and backwards', though; I guess arbitrary precision arithmetic will mean heap allocation, but that much?)

Tuesday, May 14, 2013

A tree? Maybe not...

So, how about that tree? It's tempting to do something like van Laarhoven describes, though I'd be tempted to put maxima on the left links, minima on the right links, to assist with the range check. But wait a minute. Now that we have added the purely combinatorial part, we never actually generate and selectively count more than one power of ten's worth of palindromes at a time, and at least some of those are of the trailing edge of an interval that covers a power of ten. Since the first d-digit Y is 1, for d == 1, or 10^(d-1) + 1, for d > 1, for them it's not like we have to skip a lot of Ys to find those of interest. That leaves the leading part, either [m..n] or [m..tenToThe mdigits - 1]. Just how many Ys are there for a given number of digits, anyway?

*Main> map numNDigitYs [1..50] [3,2,5,3,8,5,13,9,22,16,37,27,60,43,93,65,138,94,197,131,272,177,365,233,478,300,613,379,772,471,957,577,1170,698,1413,835,1688,989,1997,1161,2342,1352,2725,1563,3148,1795,3613,2049,4122,2326]
*Main>

OK, up to 4122 is a bit long to grind through.

We can do some things: for small d, might as well just scan as we are now. For bigger d, though, we have the one or two two-2 palindromes at the high end, and at the low end, 10.....01, optionally 10..020..01, and then no-2s and optional one-2s interleaved up to 11110...01111 (d even) or 11110..010...01111 (d odd) for d > 9. If d is 3 or 5, then the largest non-two-2 Y is 121 or 11211, else for d <=9 it's 111...111.

So, for sufficiently large d, if the first two digits of m are > 11, then there are at most two (the two-2 flavor) Ys in [m..n], depending on the parity of d (whether it's odd or even) and the value of n.

This will help when it lets us rule out no-2 and one-2 Ys, but when we have to dig into those, which account for the vast majority of the Ys with a given number of digits, we're back to scanning that list. Doing it right means coming up with a better data structure, one that lets us quickly determine how many values are in a given range.

Hmmm... come to think of it, can we get away with just generating the digits that suffice to determine each Y? Maybe those are easier to work with--it would certainly let us dump all the calls to backwards, which still takes a fair amount of time. The only issue will be the range checking--which we'd have to recast in terms of the half-palindromes. (UPDATE: Sigh, I guess not. You'd be doing it repeatedly rather than just once.)

Monday, May 13, 2013

Speaking of aging...

It's been some time since I've noticed my eyes don't accommodate as they used to--but if you take a look at this blog's layout and hit ctrl-0, that text is pretty darned small. Sorry about that; I've already changed the text color from what I think was the default for this style, #262626, to pure black, which helps, and bumped the text size a little bit. I'll tweak it some more; needing to hit ctrl-+ a few times to read my own blog is pretty sad.

Sunday, May 12, 2013

"I think that I shall never see..."

I'm rummaging around Stack Overflow and Hoogle. Data.Set and Data.Map are implemented underneath as balanced trees, but for my purposes I need the tree nature exposed so I can do something like

countInRange Empty m n = 0
countInRange (Node value (Tree lhs) (Tree rhs)) m n
    | value < m = countInRange rhs m n
    | value > n = countInRange lhs m n
    | otherwise = 1 + (countInRange lhs m n)
                    + (countInRange rhs m n)

The set fold doesn't guarantee ordering, so I don't see a way to let it quit early or otherwise skip portions that can't contribute to the count.

Maybe I will have to roll my own.

UPDATE: Twar van Laarhoven's "Search trees without sorting" looks promising for my purpose. Dank U wel.

Holy [expletive]!

jejones@eeyore:~/src/haskell_play$ time ./ultimatepalindrome <C-large-practice-2.in >ultimate2.out

real    0m0.866s
user    0m0.852s
sys     0m0.012s

I want to try a thing or two more before posting the source, but that beats the heck out of 1.7 seconds, and is sneaking up on that 0.4 second time from one of the C/C++ solutions.

Things to take away from this:

Even with Google's explanations of problems, it's worth looking further. I would like to think that the Code Jam folks did this on purpose; it's a good pedagogical technique. (Pamela McCorduck's Machines Who Think quotes Minsky lamenting that in the book he and Papert wrote on perceptrons, they pretty well did all the interesting stuff, leaving nothing for others to follow up on.)
In conversation about Mr. Reeder's blog post that started all this hoohah, a friend pointed out that whatever one does in some scripting or high-level language could be rewritten in C/assembly language/VLIW microcode and still be faster. That's true, but... to quote Weird Al Yankovic, "You could even cut a tin can with it--but you wouldn't want to!" Nowadays most of the time we accept the overhead of using C or [shudder] C++ rather than assembly language. We're now down to within a factor of a bit ~~over~~under two, and that's by a duffer fairly new to Haskell, fiddling with code off and on during spare time over roughly three weeks. Mr. Reeder was right; he just needed to push his analysis a bit further, as did I.

UPDATE: Thinking about it, breaking the problem down as this version does has the advantage that when we are generating and then walking the list of Ys, we're confining ourselves to a single power of ten range rather than walking the list all the way from 1 to wherever.

Then, too, we're not generating the Xs, but operating purely in the realm of Ys, which I'm sure saves some heap.

Profiling output claims that the code is spending 27.4% of its time and 34.7% of its allocations in backwards', and that it is the biggest "cost center" of the program. We made a point of writing it to permit tail call optimization, but perhaps there's another way to improve it.

UPDATE: divMod (or quotRem, depending on the situation) to the rescue!

These return a tuple with the quotient and the remainder (I suspect either the one that C programmers expect or the one that mathematicians expect, respectively), since division frequently hands you the remainder/modulus for free, so a simple tweak of backwards into

backwards n =
    let backwards' n accum
          | n < 10    = accum + n
          | otherwise = let (d, m) = n `divMod` 10
                        in backwards' d (10 * (m + accum))
    in backwards' n 0

took the time output to

real    0m0.779s
user    0m0.756s
sys     0m0.020s

and backwards' down to 13.8% of the time and 15.3% of the allocations. The new major "cost center", as one would expect, is now ysInRange, the function that determines how many of the (generated) Ys are within a specified interval. Let's see whether it's as easy to improve as backwards was. [pause] Probably not, at least not by perusing Hoogle (the Haskell equivalent of Google, i.e. a place to look for information about Haskell's standard modules). Time to create a different data structure, I bet a tree.

UPDATE: Shucks. I remembered a blog post Hacker News linked to about some Haskell code in which the bottleneck was digit-at-a-time conversion, so I took a whack at a fancier digitsIn that went reduced by larger powers of the base first, then took things a digit at a time. It didn't make much difference that I could see.

Looks promising...

A first cut at code that takes the aforementioned approach looks promising. Merging the purely combinatoric code with the code that generates palindromes gave rise to a name clash or two, I left out a couple of needed parentheses, Haskell didn't like my referring to some type by the name of Integber, I wrote [1..3] rather than [1,3] for the list of one digit Ys with no 2s, and I forgot that if m is a power of 10, digitsIn 10 m is one more than the common log of m. That corrected, a few lines from the big input give the same result as the previous version, and the new version was noticeably faster on some of them, which is what we were after. Now to compile and try the whole megillah. There are some things I should probably memoize, but get it right first, right?

By the way, ghci (Glasgow Haskell Compiler Interpreter?!), the interpreter, has a useful, um... OK. "Spell checker" isn't quite right, but if you make a typo, it will notice that it's close to something else that is defined, and say "Maybe you meant..." So far, it has always suggested what I meant.

I am highly impressed with how little Haskell gets in the way, and how easy it is to say what I mean. Of course, I haven't seriously entered the world of do and monads, so I should make a point of pushing myself there, but I expect it won't be the bête noire it's made out to be. I promise that once I do, I will not write yet another "monads are like containers/~~tortillas~~burritos/etc." tutorial. (In that context, don't miss Daniel Spiewak's article on the subject.)

Aging Nerd Notes