A little more stuff

So, we know that things get slow when we actually have to generate some of the Ys for a range check. The range checking is now O(log n) where n is the number of Ys we have to check. twoTwos is trivial, so that leaves oneTwos and noTwos, and they mostly call justOnes. Of course there's also the sorting, the labeling with ordinal position, and the tree building. sort doesn't even appear in the .prof file, so it must be down in the noise (or perhaps considered a primitive; after all, we don't see the additions split out). What we do see in the "cost center" list are
  • nDigitYs
  • fromList
  • digitsIn.digitsIn'
  • justOnes
  • justOnes.pair
  • floorSqrt.floorSqrt'.y
One small change to justOnes turned out to be surprisingly effective: the value returned was calculated as

[shell + sum (map pair c) | c <- innards]

It turns out to chop an easy 10% off the execution time, according to profiling output, to recast that as

[foldl' (+) shell (map pair c) | c <- innards]

"WTF?" you say? Well, folding a list takes several things:
  • a function (we're using (+), the parentheses telling Haskell we're not trying to add a function to an Integer)
  • a starting value (we're using shell)
  • the list to be folded
Folds can get you sums (+) or products (*) of lists of values; you'll often find the identity for the function used as the starting value for that kind of use. On the other hand, the function doesn't necessarily have to have type

a -> a -> a

Instead, you could have a starting value of type b and then the function would have type

b -> a -> b

which you'll see for things like fromList, which builds one data structure from another. Since we want shell plus a sum, we might as well put shell in as the starting value rather than using 0 and adding shell when the smoke clears.

Probably more important is the way that foldl' is strict. Laziness is Haskell's outstanding virtue, but there are times when you want strict, and avoiding bulding up potentially large stacks of thunks awaiting execution for a fold is one of those times.

With that one change, the profile files say the runtime drops from 0.23 s to 0.20 s. The new cost centers:

COST CENTRE            MODULE                %time %alloc

fromList.(...)         SemilatticeSearchTree  18.4   20.6
nDigitYs               Main                   15.9   19.5
digitsIn.digitsIn'     Main                   10.0    9.1
justOnes               Main                    8.5   12.3
justOnes.pair          Main                    4.5    0.0
floorSqrt.floorSqrt'.y Main                    4.0    4.3
fromList               SemilatticeSearchTree   4.0    0.5
process                Main                    3.5    8.4
choices                Main                    3.5    6.5
findFirst              SemilatticeSearchTree   3.0    0.0
digitsIn               Main                    2.5    0.2
satisfy                SemilatticeSearchTree   2.5    0.0
meet                   SemilatticeSearchTree   2.5    2.7
justOnes.innards       Main                    2.0    1.1
floorSqrt              Main                    2.0    0.6
main                   Main                    1.5    0.5
dDigitYTree            Main                    1.5    0.0
yTree                  Main                    1.5    3.2
meet                   SemilatticeSearchTree   1.5    0.0
mkBranch               SemilatticeSearchTree   1.5    2.4
noTwos                 Main                    0.5    2.2
choose                 Main                    0.0    2.2

Peeking at the call stacks shows that by far the majority of calls to digitsIn.digitsIn' take place in floorSqrt, where we're counting bits and the values passed range up to a googol, which means we're talking up to maybe 330-bit numbers (100 / .30103). Going eight at a time helped, but apparently not enough if we're now taking ten percent of our time counting bits.


Popular posts from this blog

a longest path problem

No tutorial, I swear...

Bikeshedding leap years