Saturday, June 29, 2013

Dueling with the input

Still thinking about how to keep those combinations (or palindromes, now that we see we can avoid some sorting if we go ahead and generate them) around only as long as necessary. One thing that's kind of tempting is to take advantage of what we know about when we can reuse these values--we can generate them for one of  2 * k and 2  * k + 1 and then reuse them (with some modification) for the other. So, why not generate the trees two at a time?

"Well, you don't know that you'll need both of them," you object. "The Code Jam people could make you waste your time by generating input that only asks about ranges corresponding to Ys with even numbers of digits (or odd, take your pick)." And you're right, they could. Our attempts to speed things up can be subverted... but I think this one is worth a try.

At least initially, I'd be inclined to generate a list of pairs of trees, with the first being for 2 * k digits, the second being for 2 * k + 1. (Yes, one-digit Ys are a special case again.) More news as it happens.

UPDATE: new best time output:

real    0m0.192s
user    0m0.160s
sys     0m0.028s


Total allocation is down to not quite 117 MB, compared with 124 MB before, and "maximum residency" is around 9.6 MB. GC time is down to 41.2% of total execution time. I guess that's better than almost half, but that still seems high. One thing that is gratifying: the first time I used -sstderr to save GC info, it listed almost 160 MB copied during GC and over 16 MB maximum residency. Now the maximum copying during GC is just 63 MB, less than half what it was before.

And that's using spread, which is expensive because it does Integer divides. We'll do that next, and then I really should try to ditch the gratuitous meets of values in the trees that we never search for. That should be pure gravy.

UPDATE: spread is worth it to avoid having to sort again. (Before we weren't taking advantage of that, so spread was overhead.)

Improvements

Chaddaï Fouché kindly responded to a query I put out on the haskell-beginners mailing list, suggesting:
What's iterate, you ask?

iterate :: (a -> a) -> a -> [a]

Give it a function f and a starting value s and it will hand you back

[s, f s, f . f s, f . f . f s, ...]

So, for example, rather than

powersOfTen = map (10 ^) [0..]

we can write

powersOfTen = iterate (10 *) 1

and instead of

bigTwos = map (2 ^) powersOfTwo

we can write

square n = n * n

bigTwos = iterate square 2


Not shabby, eh?

About those Vectors: they're a data structure that makes for O(1) (i.e. constant time) indexing, as opposed to the O(n) time for lists. There are two flavors: Data.Vector and Data.Vector.Unboxed. The unboxed version has lower overhead, but can't be used on all types.

So I added

import qualified Data.Vector as V

and changed a little bit of code:

          memoPair   = V.generate halfN (\i -> tenToThe i + tenToThe (n - 1 - i))
          pair i     = memoPair V.! i
          pairSum v = V.foldl' (+) shell (V.map pair v)


choices :: Int -> Int-> [V.Vector Int]

m `choices` n
    | n == 0           = [V.empty]
    | m == n           = [V.enumFromStepN m (-1) m]
    | otherwise        = [m `V.cons` c | c <- (m - 1) `choices` (n - 1)]
                         ++ ((m - 1) `choices` n)


and indeed it helped. I hope it will help more soon; I can't have an unboxed vector of Integers (or is that a vector of unboxed Integers?), so if I want the choices Ints unboxed, I can't use the unboxed vector map; I'll have to roll my own function for that.

A bigger payoff, for now, had to do with sorting, or the minimizing thereof. It's easy to make oneTwos come out in order; noTwos is the gotcha. We did the folliowing:

nDigitYs n = (merge (oneTwos n) (noTwos n)) ++ twoTwos n
    where twoTwos n
              | even n    = [twoShell]
              | otherwise = [twoShell, twoShell + tenToThe halfN]
              where twoShell = 2 * shell
          oneTwos n
              | even n    = []
              | otherwise = map (+ (shell + 2 * tenToThe halfN))
                                (0 : map pair [halfN - 1,halfN - 2..1])
          noTwos n
              | even n    = base
              | otherwise = merge base [p + tenToThe halfN | p <- base]
              where base  = sort $ map pairSum (noTwosChoices !! (halfN - 1))


and poof! The profiler output shows total time as 174 ticks, 0.17 seconds.  The best time output is now

real    0m0.214s
user    0m0.176s
sys     0m0.028s

No reduction in the percentage of time used for garbage collection, though.

You know... now it may be worth keeping palindromes rather than combinations, because we would only have to sort them once instead of twice--and that might cut the memory usage as well.

Sunday, June 23, 2013

So, how am I being wasteful?

Let me count the ways:
  • We're keeping the combinations for noTwos (2 * k) and noTwos (2 * k + 1)around forever. How can we let go of them when both those have been calculated? The simplest way would be to generate nDigitYs (2 * k) and nDigitYs (2 * k + 1) at the same time, but that has the potential to make us do extra work. This is Haskell, after all; laziness is a virtue. There has to be a better way. 
  • What is up with pairSum? (I renamed it to fit convention.) It's using up 11% of the time and nearly 12% of storage allocations; the graph shows it accumulating two megabytes of heap. Are we piling up thunks?
Alas, this isn't as simple as the Real World Haskell Chapter 25 example. There you know that accumulating and counting items on a list, which is all the example program does, ought to be doable in a constant amount of memory. We, on the other hand, are accumulating things as we go. Ideally, all we accumulate would be
  • The trees for the various numbers of digits that we actually have to search in
  • The memoized partial sums of numNDigitYs
Of course, we are also keeping around powers of ten and two, and, right now, the lists of lists of Ints for noTwos.

The immediate temptation is to make numXs take three parameters and return two values, with the added input and output being the extranea that we want to keep around for a while (initially empty, of course, and the added output of one call being passed in as the added input for the next). That seems ugly, though; the exact opposite of information hiding. I'm sure someone's thought of this sort of situation and dealt with it; I just have to learn about it. In the meantime, there's still pairSum to optimize.

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...