Aging Nerd Notes

Monday, August 05, 2013

lastLE cleanup

Remember lastLE? Back when we were trying to determine how many digits/bits it takes to represent a non-negative Integer, we used it to avoid copying a prefix of a list just to grab the value at its end.

lastLE :: Integer -> [Integer] -> Maybe (Integer, Int)

lastLE n xs =
let lastLE' xs prevVal prevIndex
     | head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
     | otherwise    = if prevIndex < 0 then Nothing
                 else Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

It still bugs me. First, the mixed guards and if/else are less than idiomatic Haskell:

lastLE n xs =
let lastLE' xs prevVal prevIndex
     | head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
           | prevIndex < 0 = Nothing
     | otherwise     = Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

Second, the head and tail are clumsy:

lastLE n xs =
let lastLE' (x:xs) prevVal prevIndex
     | x <= n        = lastLE' xs x (prevIndex + 1)
           | prevIndex < 0 = Nothing
     | otherwise     = Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

Third, that accumulator should be strict, so pretend there's a ! in front of prevIndex in that let.

I think I'd rather see Nothing as the otherwise case.

lastLE n xs =
let lastLE' (x:xs) prevVal !prevIndex
     | x <= n         = lastLE' xs x (prevIndex + 1)
           | prevIndex >= 0 = Just (prevVal, prevIndex)
     | otherwise      = Nothing
in lastLE' xs (-1) (-1)

Come to think of it, we can be consistent in our use of pairs:

lastLE n xs =
let lastLE' (x:xs) !prev@(val, index)
     | x <= n         = lastLE' xs (x, index + 1)
           | prevIndex >= 0 = Just prev
     | otherwise      = Nothing
in lastLE' xs (-1, -1)

Now, one more thing. Can I get rid of the magic numbers there, the -1s? The -1 as initial index needs to be there to get the correct position in the list of powers of the base. The initial val will never be used. If the whole pair is strict, though, we couldn't pass in ⊥ ("bottom", the undefined value, which is undefined in Haskell). Perhaps

lastLE n xs =
let lastLE' (x:xs) prev@(val, !index)
     | x <= n     = lastLE' xs (x, index + 1)
           | index >= 0 = Just prev
     | otherwise = Nothing
in lastLE' xs (undefined, -1)

would be the way to go. Hey! We never even use that value here--it's used in the caller--so we can write

lastLE n xs =
let lastLE' (x:xs) prev@(_, !index)
     | x <= n     = lastLE' xs (x, index + 1)
           | index >= 0 = Just prev
     | otherwise = Nothing
in lastLE' xs (undefined, -1)

and it should work (sure enough, it does), as well as expressing explicitly that we don't use it. Well, that's one magic number down, at least.

Sunday, August 04, 2013

Small things...

Good grief... I've let this go for a month. (UPDATE: No, I haven't. It helps to read the dates carefully.)

An option in pattern matching that I wasn't fully aware of: you can use _ when you don't care about what is there, and have no need to refer to it. I guess I did use it on the ByteString readInteger that returns Maybe(Integer, ByteString):

process line = case map B.readInteger (B.words line) of
               [Just (m, _), Just (n, _)] -> numXs m n
               _                          -> -1

But I didn't realize that I could use it in

ysInRange m n d
    | nVal == n = nPos - mPos + 1
    | otherwise = nPos - mPos
    where (_,    mPos) = findFirst' m
          (nVal, nPos) = findFirst' n
          findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
                             Just (Max i, Max j) -> (i, j)
                             Nothing             -> (tenToThe d, numNDigitYs d)

because we need only check for an exact match at the high end.

Saturday, July 27, 2013

Look for elegance

Check out nomeata's blog post "On taking the last n elements of a list". There's the obvious

takeLast n = (reverse . take n . reverse)

but all that reversing is a lot of work. The temptation is to think imperative and haul out the heavy artillery... but then he realized how it could be done efficiently and idiomatically. I won't put a spoiler here; read the original.

Thursday, July 04, 2013

"Evelyn was a modified dog..."

So, how can we modify the tree to do what we'd like?

As the code stands, it's set up to say that if a is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search. What we want to say is that if you have some type a that has a function key :: a -> b where b is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search based on the result of key. Then, if we have a list of (palindrome, position) pairs, our key function is just fst (the function that gives you the first item in a pair). Now, how to express that in Haskell?

UPDATE: come to think of it, wouldn't that subsume the existing cases? For them,

let key = id

UPDATE: OK, I should have remembered. You can create relationships between type classes. (Take a look at this diagram of relationships among types and type classes in the Standard Prelude.) So, it looks like I can say something like this:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}

class Semilattice a where
meet :: a -> a -> a

class Semilattice k => Keyed r k | r -> k where
key :: r -> k

The first type class definition is straight from Mr. van Laarhoven's code; a type is a Semilattice type if it has a meet function which satisfies the semilattice requirements for meet.

Keyed is a "multiparameter type class"; it sets up a relationship between a type r (intended to suggest "record") and a type k (intended to suggest "key"). "r -> k" is a "functional dependency". Those two features aren't in the Haskell 98 standard, but GHC supports them (if they're enabled in a pragma) and I suspect they're in Haskell'.

(Historical aside: you might not realize that the preceding paragraphs use terms that originate with Algol 68: "standard prelude" and "pragma". Algol 68 is a language that suffered from a lot of undeserved bad press; you really should read C.H. Lindsey's excellent paper on its history, and check out Algol 68 Genie. For those of you interested in functional programming, I should note that partial parametrization was up for addition to the standard and is implemented in Algol 68 Genie.)

So, I will proceed on this basis. It should be a simple rewrite of the Semilattice tree code, though I wish there were a way to have one version of it to do everything; code reuse is a Good Thing. The above lines make it past ghci, so I hope it's a good start. (Then, of course, the question is whether it really does help the Fair and Square program! OTOH, in a way I don't care, because the real goal was to learn more Haskell, not to mention that I was just lucky that the original blog post gave an example in which the Semilattice type was a pair.)

Tuesday, July 02, 2013

What next?

Yesterday it occurred to me that the profiling output might be more accurate if I just asked for -p, rather than always asking for the heap tracking. The first part of the profile output:

    Mon Jul 1 22:23 2013 Time and Allocation Profiling Report (Final)

       ultimatepalindrome20 +RTS -p -RTS

    total time =        0.17 secs   (167 ticks @ 1000 us, 1 processor)
    total alloc = 116,700,576 bytes (excludes profiling overheads)

COST CENTRE            MODULE                %time %alloc

fromList.(...)         SemilatticeSearchTree 19.2   26.8
ysPair.noTwos          Main                   11.4    8.9
floorSqrt.floorSqrt'.y Main                   10.2    5.6
choices                Main                    7.8   10.7
process                Main                    7.2   11.0
fromList               SemilatticeSearchTree   6.6    0.6
ysPair.spread.(...)    Main                    4.8    2.4
ysPair.noTwos'         Main                    4.2    4.2
meet                   SemilatticeSearchTree   3.6    0.0
satisfy                SemilatticeSearchTree   3.0    0.0
main                   Main                    1.8    1.0
numYs                  Main                    1.8    0.1
ysPair.pairSum         Main                    1.8    3.5
makeYTree              Main                    1.8    4.1
bDigits                Main                    1.8    0.0
meet                   SemilatticeSearchTree   1.8    3.5
mkBranch               SemilatticeSearchTree   1.8    3.2
ysPair.noTwosChoices   Main                    1.2    0.7
bDigits.bDigits'       Main                    1.2    1.9
bound                  SemilatticeSearchTree   1.2    0.0
ysPair.noTwos'.base    Main                    0.6    1.2
ysPair.spread          Main                    0.6    2.5
ysPair.twoNPlus1List   Main                    0.0    2.9

The cost of ysPair is spread out over several lines; add them up and you get 32.4% of the time (choices count; there's only one caller) and 37% of the allocation... but then, the rest of the profile does the adding for us; let's check it out. OK, I was pretty close. fromList with its descendants takes up about another third of the time and allocation.

floorSqrt and ceilSqrt should be low-hanging fruit. As we've said before, it converges quadratically, doubling the number of valid bits each time around... but for numbers of up to 330 bits, that's as many as nine iterations starting from just one good bit as floorSqrt does. So, the temptation is to use the double precision square root directly where possible, and use it as a first approximation otherwise. With 52 good bits as a starting point, three iterations will do the job for values up to a googol.

On the other hand, coming up with some way to use the semilattice search trees to just search on one value (and only store the maximum value searched for as the meet!), with the rest of the node along for the ride, would save time and memory and be more generally applicable, and would help me with Haskell in general. That's the way to go.

Saturday, June 29, 2013

Dueling with the input

Still thinking about how to keep those combinations (or palindromes, now that we see we can avoid some sorting if we go ahead and generate them) around only as long as necessary. One thing that's kind of tempting is to take advantage of what we know about when we can reuse these values--we can generate them for one of 2 * k and 2 * k + 1 and then reuse them (with some modification) for the other. So, why not generate the trees two at a time?

"Well, you don't know that you'll need both of them," you object. "The Code Jam people could make you waste your time by generating input that only asks about ranges corresponding to Ys with even numbers of digits (or odd, take your pick)." And you're right, they could. Our attempts to speed things up can be subverted... but I think this one is worth a try.

At least initially, I'd be inclined to generate a list of pairs of trees, with the first being for 2 * k digits, the second being for 2 * k + 1. (Yes, one-digit Ys are a special case again.) More news as it happens.

UPDATE: new best time output:

real    0m0.192s
user    0m0.160s
sys     0m0.028s

Total allocation is down to not quite 117 MB, compared with 124 MB before, and "maximum residency" is around 9.6 MB. GC time is down to 41.2% of total execution time. I guess that's better than almost half, but that still seems high. One thing that is gratifying: the first time I used -sstderr to save GC info, it listed almost 160 MB copied during GC and over 16 MB maximum residency. Now the maximum copying during GC is just 63 MB, less than half what it was before.

And that's using spread, which is expensive because it does Integer divides. We'll do that next, and then I really should try to ditch the gratuitous meets of values in the trees that we never search for. That should be pure gravy.

UPDATE: spread is worth it to avoid having to sort again. (Before we weren't taking advantage of that, so spread was overhead.)

Improvements

Chaddaï Fouché kindly responded to a query I put out on the haskell-beginners mailing list, suggesting:

using iterate
using Data.Vector or MemoTrie instead of lists where possible

What's iterate, you ask?

iterate :: (a -> a) -> a -> [a]

Give it a function f and a starting value s and it will hand you back

[s, f s, f . f s, f . f . f s, ...]

So, for example, rather than

powersOfTen = map (10 ^) [0..]

we can write

powersOfTen = iterate (10 *) 1

and instead of

bigTwos = map (2 ^) powersOfTwo

we can write

square n = n * n

bigTwos = iterate square 2

Not shabby, eh?

About those Vectors: they're a data structure that makes for O(1) (i.e. constant time) indexing, as opposed to the O(n) time for lists. There are two flavors: Data.Vector and Data.Vector.Unboxed. The unboxed version has lower overhead, but can't be used on all types.

So I added

import qualified Data.Vector as V

and changed a little bit of code:

          memoPair   = V.generate halfN (\i -> tenToThe i + tenToThe (n - 1 - i))
          pair i     = memoPair V.! i
          pairSum v = V.foldl' (+) shell (V.map pair v)

choices :: Int -> Int-> [V.Vector Int]

m `choices` n
    | n == 0           = [V.empty]
    | m == n           = [V.enumFromStepN m (-1) m]
    | otherwise        = [m `V.cons` c | c <- (m - 1) `choices` (n - 1)]
                         ++ ((m - 1) `choices` n)

and indeed it helped. I hope it will help more soon; I can't have an unboxed vector of Integers (or is that a vector of unboxed Integers?), so if I want the choices Ints unboxed, I can't use the unboxed vector map; I'll have to roll my own function for that.

A bigger payoff, for now, had to do with sorting, or the minimizing thereof. It's easy to make oneTwos come out in order; noTwos is the gotcha. We did the folliowing:

nDigitYs n = (merge (oneTwos n) (noTwos n)) ++ twoTwos n
    where twoTwos n
              | even n    = [twoShell]
              | otherwise = [twoShell, twoShell + tenToThe halfN]
              where twoShell = 2 * shell
          oneTwos n
              | even n    = []
              | otherwise = map (+ (shell + 2 * tenToThe halfN))
                                (0 : map pair [halfN - 1,halfN - 2..1])
          noTwos n
              | even n    = base
              | otherwise = merge base [p + tenToThe halfN | p <- base]
              where base = sort $ map pairSum (noTwosChoices !! (halfN - 1))

and poof! The profiler output shows total time as 174 ticks, 0.17 seconds. The best time output is now

real    0m0.214s
user    0m0.176s
sys     0m0.028s

No reduction in the percentage of time used for garbage collection, though.

You know... now it may be worth keeping palindromes rather than combinations, because we would only have to sort them once instead of twice--and that might cut the memory usage as well.

Sunday, June 23, 2013

So, how am I being wasteful?

Let me count the ways:

We're keeping the combinations for noTwos (2 * k) and noTwos (2 * k + 1)around forever. How can we let go of them when both those have been calculated? The simplest way would be to generate nDigitYs (2 * k) and nDigitYs (2 * k + 1) at the same time, but that has the potential to make us do extra work. This is Haskell, after all; laziness is a virtue. There has to be a better way.
What is up with pairSum? (I renamed it to fit convention.) It's using up 11% of the time and nearly 12% of storage allocations; the graph shows it accumulating two megabytes of heap. Are we piling up thunks?

Alas, this isn't as simple as the Real World Haskell Chapter 25 example. There you know that accumulating and counting items on a list, which is all the example program does, ought to be doable in a constant amount of memory. We, on the other hand, are accumulating things as we go. Ideally, all we accumulate would be

The trees for the various numbers of digits that we actually have to search in
The memoized partial sums of numNDigitYs

Of course, we are also keeping around powers of ten and two, and, right now, the lists of lists of Ints for noTwos.

The immediate temptation is to make numXs take three parameters and return two values, with the added input and output being the extranea that we want to keep around for a while (initially empty, of course, and the added output of one call being passed in as the added input for the next). That seems ugly, though; the exact opposite of information hiding. I'm sure someone's thought of this sort of situation and dealt with it; I just have to learn about it. In the meantime, there's still pairSum to optimize.

Saturday, June 22, 2013

In the great tradition of nibbling at the edges...

...and because I am still puzzling over why pairsum is grabbing so much RAM, let's contemplate floorSqrt. We're just calling it 2,000 times, and yet it's collectively taking up over six percent of the CPU time and five percent of allocation?!

Let's remind ourselves of floorSqrt (tweaked because I forgot that I'd memoized powers of two)

floorSqrt :: Integer -> Integer

floorSqrt 0 = 0
floorSqrt n = floorSqrt' (twoToThe ((1 + bitsIn n) `div` 2))
    where floorSqrt' x =
              let y = (x + n `div` x) `div` 2
              in if y >= x then x else floorSqrt' y

This is a special case of the Newton-Raphson method that was known to the ancient Greeks. We treat zero specially to avoid one of the gotchas of Newton-Raphson, but aside from that, this particular example is well-behaved, and has what they call quadratic convergence: once you're sufficiently close, each successive guess is good to twice as many digits as the one before. Our first guess is good to one bit, so the next is good to two bits, then four, then eight, etc. We're dealing with Xs in ranges up to a googol, or around 2³³⁰, which would me no more than nine iterations per value. Checking the profiling output, we see a total of 7,175 + 7,472 = 14,647 calls to floorSqrt', which is within our constraint, but darned close to it, so the inputs are tending to the high end of the range.

I kind of hate to cheat, but let's see what happens with

floorSqrt n = floorSqrt' $ (floor. sqrt . fromIntegral) n

[pause to compile, run, check profile output, and check... uh-oh.]

What happens is that it doesn't give the correct results; the output is not correct, which is weird. You'd think that would give you 52 good bits, and thus just need three iterations, but then, look at that termination condition. There has to be something about it that needs that first trial value to make it work, so you can't just plug in another, even better, guess. Taking the first value from the large input, the floorSqrt we've been using returns

10000010000100100100100001000000

while using the supposedly better first guess gives us

10000010000100100017503961350144

We could change the function to take ceil log2 (# bits in n / 52) iterations, but I think I should get back to dealing with memory usage and pairsum. (Even with that change you probably would end up flipping a coin for whether you got the floor or the ceiling of the square root of n.)

Thursday, June 20, 2013

Now that we have better data...

...let's see what's happening.

First, the single sample point faked us out, making us think we were using less RAM than we really were. We are chewing up pretty nearly ten megabytes--still less than that C program, but a bit disappointing.

Or maybe more disappointing than we thought. Here's the output describing heap usage from a run with the options +RTS -sstderr:

     202,392,800 bytes allocated in the heap
      84,601,736 bytes copied during GC
      12,550,176 bytes maximum residency (8 sample(s))
         220,824 bytes maximum slop
              30 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed) Avg pause Max pause
Gen 0       387 colls,     0 par    0.10s    0.10s     0.0003s    0.0009s
Gen 1         8 colls,     0 par    0.06s    0.06s     0.0080s    0.0230s

INIT    time    0.00s ( 0.00s elapsed)
MUT     time    0.19s ( 0.20s elapsed)
GC      time    0.16s ( 0.16s elapsed)
RP      time    0.00s ( 0.00s elapsed)
PROF    time    0.00s ( 0.00s elapsed)
EXIT    time    0.00s ( 0.00s elapsed)
Total   time    0.36s ( 0.36s elapsed)

%GC     time      45.2% (45.3% elapsed)

Alloc rate    1,028,021,469 bytes per MUT second

Productivity 54.8% of total user, 55.0% of total elapsed

Whoa. Nearly half our time spent garbage collecting? There's got to be a way to improve that. (And on the graph, nearly two megabytes taken by pairsum?)

Learning to avoid cabal--or not

Well... last night I thought I'd install EclipseFP, a package for Eclipse to support Haskell development. When you fire it up, it goes looking for packages it wants, and apparently uses cabal to install them. It did; I watched it do so for some time.

It turned out to be a waste of time; when I fired up Eclipse (which I'm rather new to) and clicked on the little lambda over to the left, a window opened up that looked half-drawn and very broken. I suspect that was issues with Eclipse--perhaps I should wipe the latest version that I grabbed and installed, and settle for the ancient version that, for some reason, is what Ubuntu has in its repositories.

OK, so I'll pass on an IDE for Haskell for now, or start up with leksah.

This morning, I had one of those sudden realizations that you get that make you laugh at yourself. Why do those memory usage graphs look like pyramids? Because the default sample interval is 0.1 seconds, and I have the run time down around 0.2 seconds, sort of like taking a sample every couple of minutes and expecting to get an accurate playback of a song.

A peek at Real World Haskell and I see the option to change the interval, so I recompile and run... and it claims that I should link with an option to turn on the RTS capabilities.

What?!

Another compile or two to make sure I did indeed specify the right options... and then some Googling, because the same thing still happened.

From the Google results, I suspect that all those cabal installs pulled in versions of libraries that don't support profiling, and that ghc is pulling them in. Following people's advice, I rm -rf ~/.ghc. No luck, still can't profile.

I have learned one thing: I am not going to let cabal touch my computer ever again. I would have sworn that I specified that packages should just be installed for me, so the rm should have done the trick. I guess I can look for Haskell libraries dated yesterday and delete every single one of them.

UPDATE: BZZT! Turns out the issue comes from trying to use that -i option to override the heap measurement interval. That's what's giving me the problem. I was wrong... and maybe I'll consider using cabal... sometime when I know Haskell much better than I do now.

UPDATE: Found it... since ghc 6.x, you have to compile with -rtsopts=all to be able to use some (OK, most) RTS options, lest the logging they permit be used to breach security. I will have to try to figure out how -i would give you the opportunity to do so over and above the output that -h*, which doesn't require -rtsopts=all, allows. (OK, maybe it would be possible to allocate a lot of RAM or not every [interval], a lot of RAM means a 1, not much means 0.)

hlint

Back in the early days of Unix, when the PDP--11/70's main advantage over the later 8/16-bit 6809 was having separate I/D (instruction and data) space, so that you could have 64K of code with access to 64K of data, the virtue of the Unix Way of small programs that did one thing and did it well was a necessity. One of the things it gave rise to was a separate program, "lint", to check C source code for constructs that might be evidence of a coding error, so that the compiler could concentrate on simply generating code.

Nowadays, C compilers often do some of the checking that was once delegated to lint (though separate lint programs still exist, and are very good--e.g. splint, or Gimpel Software's excellent products).

What about Haskell? Well, for Haskell there's hlint. It will give you advice on the Haskell source that you feed it.

jejones@eeyore:~/src/haskell_play$ hlint ultimatepalindrome14.hs
ultimatepalindrome14.hs:236:13: Warning: Use zipWith
Found:
map showsResult $ zip [1 ..] (map process (tail $ B.lines s))
Why not:
zipWith (curry showsResult) [1 ..] (map process (tail $ B.lines s))

ultimatepalindrome14.hs:237:5: Error: Use .
Found:
mapM putStr $ map ($ "\n") r
Why not:
mapM (putStr . ($ "\n")) r

2 suggestions
jejones@eeyore:~/src/haskell_play$

So there are (at least!) a couple of places I could have written arguably better, easier to read, more idiomatic Haskell. Pretty cool.

What's with that "."? In that context, "." is the function composition operator; given two functions, one of which returns values of the type the other takes as input, composing them gives you a function that applies first one function, then the other. Huh? Better to write it in Haskell:

. :: (a -> b) -> (c -> a) -> (c -> b)

f . g x = f (g x)

or, equivalently,

f . g = \x -> f (g x)

\ isn't a character escape; it's the closest the creators of Haskell could come to the Greek letter lambda, as in Alonzo Church's "lambda calculus". A lambda expression is an "anonymous function"; the way to read the right hand side of that last line is "the function that, given an argument x, returns f (g x)".

(Ironically, the lambda was in turn a sort of best approximation; it was inspired by the use of the circumflex in the notation of Russell and Whitehead's Principia Mathematica. Details here.)

Anyway... there's also a Haskell "style scanner" that one can either just get suggestions from or use in the fashion of indent. I will have to check it out.

Wednesday, June 19, 2013

It's not just an idiom...

In this program I've taken a list of values and fed it to

zip [1..]

You'll recall that zip takes two lists and returns a new list as long as the shortest of the lists handed to it. Each element of the new list is a pair of values at corresponding positions in the lists, so that, for example,

zip [1..] "hiya" == [(1, 'h'), (2, 'i'), (3, 'y'), (4, 'a')]

Remind you of anything you learned to do as a kid? Yes, it's constructing an bijection from a set (here represented as a list) to the first however many counting numbers, aka counting. Little did I know back then that I was preparing for Haskell.

Tuesday, June 18, 2013

Further restructuring

It occurred to me that I could push the code even further, making the connection between counting and generation more apparent. Recall the counting code:

numNDigitYs 1 = 3
numNDigitYs n = numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos n = if even n then 1 else 2
          numOneTwos n = if even n then 0 else n `div` 2
          numNoTwos n = if even n then s else 2 * s
                         where h = n `div` 2 - 1
                               s = sum [h `choose` i | i <- [0..min h 3]]

Here's the new generation (which sounds like some cheesy 60s thing--sorry!); we've gotten rid of justOnes.

nDigitYs 1 = [1,2,3]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
    where halfN = n `div` 2
          pair i = tenToThe i + tenToThe (n - (i + 1))
          twoTwos n
              | even n    = [twoTwosNoOnes]
              | otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe halfN]
              where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
          oneTwos n
              | even n    = []
              | otherwise = map (+ common)
                                 (0 : [pair i | i <- [1..halfN - 1]])
              where common = pair 0 + 2 * tenToThe halfN
          noTwos n
              | even n    = base
              | otherwise = concat [[p, p + tenToThe halfN] | p <- base]
              where pairsum xs = foldl' (+) (pair 0) (map pair xs)
                    base       = map pairsum (noTwosChoices !! (halfN - 1))

noTwosChoices = [concat [n `choices` k | k <- [0..min 3 n]] | n <- [0..]]

We explicitly generate the oneTwos values in a way that makes clear that there are n `div` 2 of them, and it's similarly clear that numNoTwos is correct. Life is beautiful, right?

Well, not quite. We're now eating up ten megabytes of memory instead of six and change; total allocation is up from 127 MB to 140 MB, and total time, according to the profiling output, is up by .03 seconds. Saving lists of lists of Ints will definitely add to the allocation, so we would expect that. We're really beating on pair and pairsum, so that's probably where to look to get resource usage pared (no word play intended) back down.

UPDATE: We didn't memoize pair; doing that took us back down to 0.2 seconds, cutting the difference back down to .01 seconds. Total allocation is back down to 130 MB, but memory is up somewhere between 10 and 11 MB.

UPDATE: I wonder whether I forgot something, perhaps -fllvm. Recompiling and rerunning shows 0.19 seconds run time in profiler output, and just about 8 MB of memory usage (as opposed to total allocation, still around 130 MB).

I hate to do something as crass as packing three values in an Int or maybe an Int32, but it's grating to keep a list of lists of Ints hanging around, too. I'll give it a try. (The results were slower and used more RAM. So much for that.)

Monday, June 17, 2013

Maybe we're saving the wrong thing

Maybe we should keep the lists of combinations around instead of the base values; that would get us out of the divMod business, though OTOH it means hanging on to those lists; as it stands, I bet that they're being garbage collected.

The problem is that the bitmap moves us back to O(n), where n is the number of digits, rather than a fixed amount of work (because there's a fixed upper bound on the number of nonzero digits). We could roll our own bit fields, and stuff up to three bitsIn (n `div` 2) values into an Int or an Integer (actually, Haskell has Int8, Int16, Int32, and Int64 types to choose from; 21 bits a shot would allow for up to two million digits, times two for both halves of the palindrome--that's a lot bigger than the second official data set needs), but that seems kind of cheesy.

I'll have to think about that some more. In the meantime, I restructured the code a little bit to make the code that generates Ys more clearly parallel to the code that just counts them. The counting code:

numNDigitYs :: Int -> Int

numNDigitYs 1 = 3
numNDigitYs n = numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos n = if even n then 1 else 2
          numOneTwos n = if even n then 0 else n `div` 2
          numNoTwos n = if even n then s else 2 * s
                         where h = n `div` 2 - 1
                               s = sum [h `choose` i | i <- [0..min h 3]]

The generating code:

nDigitYs :: Int -> [Integer]

nDigitYs 1 = [1,2,3]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
    where twoTwos n
              | even n    = [twoTwosNoOnes]
              | otherwise = [twoTwosNoOnes,
                             twoTwosNoOnes + tenToThe (n `div` 2)]
              where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
          oneTwos n
              | even n    = []
              | otherwise = map (+ 2 * tenToThe halfN)
                                 (justOnes n (min 1 (halfN - 1)))
              where halfN = n `div` 2
          noTwos n
              | even n    = base
              | otherwise = concat [[p, p + tenToThe halfN] | p <- base]
              where halfN = n `div` 2
                    base = justOnes n (min 3 (halfN - 1))

Yeah, I guess that shows I'm seriously at the point of diminishing returns.

Saturday, June 15, 2013

Nomenclature, and data structures

First, I should correct my terminology. All computers are binary these days; Setun was a one-shot, and the days of BCD-based computers are long gone.

This is the case of noTwos, and we've already dealt with the one case where a 3 can appear, so all the digits are either zero or one. So there's an obvious correspondence between these decimal palindromes and binary palindromes, pairing palindromes whose printable representations in their respective bases are equal.

Of course, n-digit binary numbers map exactly to the elements of the power set of a set of n elements, so we can represent our generated combinations (vide the function choices) as binary numbers. This is pretty tempting as an alternative to a [Int], but I don't want to iterate over all those bits to see which are set. After all, if we're just doing half of the palindrome, and we know the most significant digit has to be 1, that leaves at most three bits on.

Maybe we can use a trick: consider a number n of a type in the Bits class. What can we say about

n .&. (n - 1)?

(The Haskell bitwise "and" and "or" are a little funky-looking.)

Well, if n == 0, the result is zero; otherwise, n has some bits set, and one of them is the least-significant bit. n - 1 will have a zero in that position, and the more significant bits will be unchanged. The "and" gets rid of the trailing ones in case the least significant bit of n isn't in the ones position.

So, you can write an lsb function:

import Data.Bits

lsb :: (Bits a) => a -> a

lsb n = n `xor` (n .&. (n - 1))

and then peel the bits off one at a time. To do that, we shouldn't just give back the LSB, because we're generating the next value in the sequence while we're at it, so

bitStrip :: (Bits a) => a -> (a, a)

bitStrip n = (lsb, leftovers)
where leftovers = n .&. (n - 1)
lsb = n `xor` leftovers

and we can pull them off until we end up with zero...but we still have to map over to base 10; that lsb is 2^i for some i. I'm not sure whether the LSB trick will help any.

Thursday, June 13, 2013

Of course there's a pattern

I can't believe I didn't see it before. Here's the deal:

You can get from the base values for 2 * k digits to those for 2 * k + 1 digits as we described earlier.
You can get from the base values for 2 * k digits to some of those for 2 * k + 2 digits.

If you take all the base values for 2 * k digits and make a two-digit gap between their halves, you have all the base values for 2 * k + 2 digits that don't have ones in the gap. The ones that do have ones in the gap have, outside the gap, the halves of 2 * k digit palindromes that have one less one per half than the base values for 2 * k digits, because only that way do you get ones left over to fill the gap.

The 65536-dollar question, then, is whether it's worthwhile to generate them this way. (And of course, you can generate the ones with one less one per half for 2 * k digits from those with two less ones per half for 2 * k - 2 digits. That doesn't go on forever, to be sure; since the added ones per half is no bigger than three, you're down to the base case of 100...001 pretty quickly. Again, though, is it worth it?)

One thing to consider that might help out: doing justOnes and noTwos sorts of things purely in binary and moving back to decimal only at the very end. If we take this route, it would also be worth dealing purely with the upper halves of the no-one Ys until it's time to convert. Consider: dealing with the Ys takes you down from at most 100 decimal digits to at most 50 decimal digits. Dealing with just the upper half takes you down to at most 25 decimal digits. Doing the stuff where the decimal digits are restricted to ones and zeros means you can operate on values with at most 25 bits until the very end. Save the arbitrary precision for where you need it, right? Even if, like me, you kind of hate setting arbitrary limits, in practice it will mean that your Integers will actually, for purposes of this problem, be small enough that operations on them will be, if they're implemented by a list or array of fixed-size hunks, the fastest kind, because there will only be one hunk to mess with.

(OK, I admit I've limited some things in my code to Int rather than Integer. OTOH, I think the (29-bit) Int values suffice, since there are only something like 51,000 of these palindromes up to a googol, for heaven's sake, I suspect, though I haven't calculated, that there aren't 2^29, or around 512 million of the palindromes, until a pretty darned big upper bound. OK, maybe it's 256 million--signed, right?--but that's still on up there.)

I'll have to think about this some more.

UPDATE: I pulled the counting routines, tweaked them to take and return Integer, and then did

last $ takeWhile (\(x, y) -> y <= 2^29) (zip [1..] nDigitYsPartialSums)

Turns out you have to get past 10^515 before the number of Ys risks going past 2^29 (and hence needing an Integer to count them), and thus the corresponding Xs would be up around 10^1030.

Not sure whether it helped

With that change, we now have the following for noTwos and associated functions. (Oh, yeah... we made the one-digit case special as we did for counting the d-digit Ys, for the same reason, i.e. putting the special case in one place.)

evenNoTwos = [justOnes n (min 3 (n `div` 2 - 1)) | n <- [2,4..]]

noTwos :: Int -> [Integer]

noTwos n
    | even n     = base
    | otherwise = concat [[p, p + tenToThe halfN] | p <- map spread base]
    where halfN    = n `div` 2
          base     = evenNoTwos !! (halfN - 1)
          spread x = let (a, b) = x `divMod` (tenToThe halfN)
                     in a * tenToThe (halfN + 1) + b

Did it make a difference? Yes, but... it was a bit slower. By profiling output, 193 ticks instead of 187; time output looked worse, with the best apparently

real    0m0.232s
user    0m0.200s
sys     0m0.028s

Recall that for the previous version, the best was

real    0m0.224s
user    0m0.176s
sys     0m0.044s

On the other hand, looking deeper into the profiler output, before, noTwos and things it called was eating 19.3% of the CPU time, while now that's 7.8%, but to be fair we should include evenNoTwos, which snarfs 7.3%--but still that's a total of 15.1%, which should be a win. Despite total allocation being down, actual memory usage is up, from maybe 6.25 MB to close to 7 MB; we are, after all, keeping more stuff around.

Wednesday, June 12, 2013

Meanwhile, I'm still thinking...

I took a quick look at choices, and I don't see an obvious way to generate them in what would correspond to ascending order. (Though that doesn't mean there isn't one.)

One thing we can do, though... take a look at noTwos, the source of by far the most of the d-digit Ys for all but the smallest d values.

noTwos n
    | n == 1     = [1,3]
    | even n     = base
    | otherwise = concat [[p, p + tenToThe halfN] | p <- base]
    where halfN = n `div` 2
          base = justOnes n (min 3 (halfN - 1))

In particular, look at how base is calculated. Suppose n == 2 * k for some k. Then for n and n + 1, halfN - 1 has the same value and thus min 3 (halfN - 1) has the same value. Sure, n is different, but that just means for n + 1 there's a one digit gap left to either put a one in or leave at zero. Check it out.

*Main> justOnes 2 0
[11]
*Main> justOnes 3 0
[101]
*Main> justOnes 4 1
[1001,1111]
*Main> justOnes 5 1
[10001,11011]
*Main> justOnes 6 2
[100001,101101,110011,111111]
*Main> justOnes 7 2
[1000001,1010101,1100011,1110111]

So, corresponding to a base value x for noTwos (2 * k), the corresponding value for noTwos (2 * k + 1) is

let (a, b) = x `divMod` (tenToThe k)
in a * tenToThe (k + 1) + b

Less work than going through justOnes, I'd say... and it applies whether we figure out a clever way to generate the values in order or not. Note also that this mapping preserves numerical order, so if we do find a way to cleverly generate these values in order, taking advantage of this mapping won't mess that up.

We can save the base values for even numbers, though it would be nice if we could somehow notice when we do their successors and render them garbage collectable; otherwise that's 15,275 Integers hanging around in lists. Haskell has some memoization types and functions around that go beyond the simple lists as arrays we've been doing, but I'm not sure whether they support something like that.

"You can see a lot by just looking" --Yogi Berra

There's a reason Code Jam urges one to look at examples--in this problem, it points you towards the constraint on the Ys that I didn't see because I only considered the most/least significant digit. And now, we'll see whether it can point us at how we might generate noTwos in ascending numerical order.

noTwos, like oneTwos, is based on the result of justOnes. For now, at least, let's ignore the case of an odd number of digits; for them, justOnes simply returns the same values as for the next lower number of digits with a zero in the middle. Now, to take a look at

putStr $ unlines $ map show (sort (justOnes (2 * n) (min (n - 1) 3))

for some values of n. Hey, wait; these are palindromes, so let's just look at the top half. After all, the rest is redundant, and it's the top half that determines the ordering.

4:

10
11

6:

100
101
110
111

8:

1000
1001
1010
1011
1100
1101
1110
1111
10:

10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110

Oops. 10 breaks the pattern, because you can't have five ones. 12 shows a skip in the middle where there would otherwise be five ones, and as n increases, we'll presumably see more of those.

12:

100000
100001
100010
100011
100100
100101
100110
100111
101000
101001
101010
101011
101100
101101
101110
110000 <-
110001
110010
110011
110100
110101
110110
111000
111001
111010
111100

So, if we pretend these are binary numbers, up to eight digits (total), they go from 10...0 to 11..1. Past eight digits, they go from 10...0 to 11110..0, skipping any value that has more than four ones.

So there is a pattern. How to generate it efficiently?

UPDATE: I'm not sure there is a way; it's tempting to take the interval including all the numbers and then filter (\x -> popCount x <= o + 1) but that's a lot of numbers to generate and then mostly throw away; up at the high end for numbers up to a googol, that'd be around 28 million values that you'd grind out and then throw away all but a few thousand... and then you get to create the palindrome from the top half and switch back to base 10 on top of that. I don't think that's going to cut it... so, back to getting the list of lists of ones to come out in what corresponds to ascending order for the values justOnes generates.