Showing posts from June 16, 2013

In the great tradition of nibbling at the edges...

...and because I am still puzzling over why pairsum is grabbing so much RAM, let's contemplate floorSqrt. We're just calling it 2,000 times, and yet it's collectively taking up over six percent of the CPU time and five percent of allocation?!

Let's remind ourselves of floorSqrt (tweaked because I forgot that I'd memoized powers of two)

floorSqrt :: Integer -> Integer

floorSqrt 0 = 0
floorSqrt n = floorSqrt' (twoToThe ((1 + bitsIn n) `div` 2))
    where floorSqrt' x =
              let y = (x + n `div` x) `div` 2
              in if y >= x then x else floorSqrt' y

This is a special case of the Newton-Raphson method that was known to the ancient Greeks. We treat zero specially to avoid one of the gotchas of Newton-Raphson, but aside from that, this particular example is well-behaved, and has what they call quadratic convergence: once you're sufficiently close, each successive guess is good to twice as many digits as the one before. Our first guess is …

Now that we have better data...

...let's see what's happening.

First, the single sample point faked us out, making us think we were using less RAM than we really were. We are chewing up pretty nearly ten megabytes--still less than that C program, but a bit disappointing.

Or maybe more disappointing than we thought. Here's the output describing heap usage from a run with the options +RTS -sstderr:

     202,392,800 bytes allocated in the heap
      84,601,736 bytes copied during GC
      12,550,176 bytes maximum residency (8 sample(s))
         220,824 bytes maximum slop
              30 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       387 colls,     0 par    0.10s    0.10s     0.0003s    0.0009s
  Gen  1         8 colls,     0 par    0.06s    0.06s     0.0080s    0.0230s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.19s  (  0.20s elapsed)
  GC      time    0.16s  (  0.16s elapsed)
  RP      time    0.00s…

Learning to avoid cabal--or not

Well... last night I thought I'd install EclipseFP, a package for Eclipse to support Haskell development. When you fire it up, it goes looking for packages it wants, and apparently uses cabal to install them. It did; I watched it do so for some time.

It turned out to be a waste of time; when I fired up Eclipse (which I'm rather new to) and clicked on the little lambda over to the left, a window opened up that looked half-drawn and very broken. I suspect that was issues with Eclipse--perhaps I should wipe the latest version that I grabbed and installed, and settle for the ancient version that, for some reason, is what Ubuntu has in its repositories.

OK, so I'll pass on an IDE for Haskell for now, or start up with leksah.

This morning, I had one of those sudden realizations that you get that make you laugh at yourself. Why do those memory usage graphs look like pyramids? Because the default sample interval is 0.1 seconds, and I have the run time down around 0.2 seconds, sort…


Back in the early days of Unix, when the PDP--11/70's main advantage over the later 8/16-bit 6809 was having separate I/D (instruction and data) space, so that you could have 64K of code with access to 64K of data, the virtue of the Unix Way of small programs that did one thing and did it well was a necessity. One of the things it gave rise to was a separate program, "lint", to check C source code for constructs that might be evidence of a coding error, so that the compiler could concentrate on simply generating code.

Nowadays, C compilers often do some of the checking that was once delegated to lint (though separate lint programs still exist, and are very good--e.g. splint, or Gimpel Software's excellent products).

What about Haskell? Well, for Haskell there's hlint. It will give you advice on the Haskell source that you feed it.

jejones@eeyore:~/src/haskell_play$ hlint ultimatepalindrome14.hs
ultimatepalindrome14.hs:236:13: Warning: Use zipWith
  map showsRe…

It's not just an idiom...

In this program I've taken a list of values and fed it to

zip [1..]
You'll recall that zip takes two lists and returns a new list as long as the shortest of the lists handed to it. Each element of the new list is a pair of values at corresponding positions in the lists, so that, for example,

zip [1..] "hiya" == [(1, 'h'), (2, 'i'), (3, 'y'), (4, 'a')]
Remind you of anything you learned to do as a kid? Yes, it's constructing an bijection from a set (here represented as a list) to the first however many counting numbers, aka counting. Little did I know back then that I was preparing for Haskell.

Further restructuring

It occurred to me that I could push the code even further, making the connection between counting and generation more apparent. Recall the counting code:

numNDigitYs 1 = 3
numNDigitYs n = numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos n = if even n then 1 else 2
          numOneTwos n = if even n then 0 else n `div` 2
          numNoTwos  n = if even n then s else 2 * s
                         where h = n `div` 2 - 1
                               s = sum [h `choose` i | i <- [0..min h 3]]

Here's the new generation (which sounds like some cheesy 60s thing--sorry!); we've gotten rid of justOnes.

nDigitYs 1 = [1,2,3]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
    where halfN  = n `div` 2
          pair i = tenToThe i + tenToThe (n - (i + 1))
          twoTwos n
              | even n    = [twoTwosNoOnes]
              | otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe halfN]
              where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
          oneTwos n

Maybe we're saving the wrong thing

Maybe we should keep the lists of combinations around instead of the base values; that would get us out of the divMod business, though OTOH it means hanging on to those lists; as it stands, I bet that they're being garbage collected.

The problem is that the bitmap moves us back to O(n), where n is the number of digits, rather than a fixed amount of work (because there's a fixed upper bound on the number of nonzero digits). We could roll our own bit fields, and stuff up to three bitsIn (n `div` 2) values into an Int or an Integer (actually, Haskell has Int8, Int16, Int32, and Int64 types to choose from; 21 bits a shot would allow for up to two million digits, times two for both halves of the palindrome--that's a lot bigger than the second official data set needs), but that seems kind of cheesy.

I'll have to think about that some more. In the meantime, I restructured the code a little bit to make the code that generates Ys more clearly parallel to the code that just coun…