Holy [expletive]!

jejones@eeyore:~/src/haskell_play$ time ./ultimatepalindrome <C-large-practice-2.in >ultimate2.out

real    0m0.866s
user    0m0.852s
sys     0m0.012s

I want to try a thing or two more before posting the source, but that beats the heck out of 1.7 seconds, and is sneaking up on that 0.4 second time from one of the C/C++ solutions.

Things to take away from this:
  • Even with Google's explanations of problems, it's worth looking further. I would like to think that the Code Jam folks did this on purpose; it's a good pedagogical technique. (Pamela McCorduck's Machines Who Think quotes Minsky lamenting that in the book he and Papert wrote on perceptrons, they pretty well did all the interesting stuff, leaving nothing for others to follow up on.)
  • In conversation about Mr. Reeder's blog post that started all this hoohah, a friend pointed out that whatever one does in some scripting or high-level language could be rewritten in C/assembly language/VLIW microcode and still be faster. That's true, but... to quote Weird Al Yankovic, "You could even cut a tin can with it--but you wouldn't want to!" Nowadays most of the time we accept the overhead of using C or [shudder] C++ rather than assembly language. We're now down to within a factor of a bit overunder two, and that's by a duffer fairly new to Haskell, fiddling with code off and on during spare time over roughly three weeks. Mr. Reeder was right; he just needed to push his analysis a bit further, as did I.
UPDATE: Thinking about it, breaking the problem down as this version does has the advantage that when we are generating and then walking the list of Ys, we're confining ourselves to a single power of ten range rather than walking the list all the way from 1 to wherever.

Then, too, we're not generating the Xs, but operating purely in the realm of Ys, which I'm sure saves some heap.

Profiling output claims that the code is spending 27.4% of its time and 34.7% of its allocations in backwards', and that it is the biggest "cost center" of the program. We made a point of writing it to permit tail call optimization, but perhaps there's another way to improve it.

UPDATE: divMod (or quotRem, depending on the situation) to the rescue!

These return a tuple with the quotient and the remainder (I suspect either the one that C programmers expect or the one that mathematicians expect, respectively), since division frequently hands you the remainder/modulus for free, so a simple tweak of backwards into

backwards n =
    let backwards' n accum
          | n < 10    = accum + n
          | otherwise = let (d, m) = n `divMod` 10
                        in backwards' d (10 * (m + accum))
    in backwards' n 0

took the time output to

real    0m0.779s
user    0m0.756s
sys     0m0.020s

and backwards' down to 13.8% of the time and 15.3% of the allocations. The new major "cost center", as one would expect, is now ysInRange, the function that determines how many of the (generated) Ys are within a specified interval. Let's see whether it's as easy to improve as backwards was. [pause] Probably not, at least not by perusing Hoogle (the Haskell equivalent of Google, i.e. a place to look for information about Haskell's standard modules). Time to create a different data structure, I bet a tree.

UPDATE: Shucks. I remembered a blog post Hacker News linked to about some Haskell code in which the bottleneck was digit-at-a-time conversion, so I took a whack at a fancier digitsIn that went reduced by larger powers of the base first, then took things a digit at a time. It didn't make much difference that I could see.


Popular posts from this blog

Haskell Tool Stack for Ubuntu 16.04

TMTOWTDI, Haskell Style

Fun with pattern matching