- nDigitYs
- fromList
- digitsIn.digitsIn'
- justOnes
- justOnes.pair
- floorSqrt.floorSqrt'.y
[shell + sum (map pair c) | c <- innards]
It turns out to chop an easy 10% off the execution time, according to profiling output, to recast that as
[foldl' (+) shell (map pair c) | c <- innards]
"WTF?" you say? Well, folding a list takes several things:
- a function (we're using (+), the parentheses telling Haskell we're not trying to add a function to an Integer)
- a starting value (we're using shell)
- the list to be folded
a -> a -> a
Instead, you could have a starting value of type b and then the function would have type
b -> a -> b
which you'll see for things like fromList, which builds one data structure from another. Since we want shell plus a sum, we might as well put shell in as the starting value rather than using 0 and adding shell when the smoke clears.
Probably more important is the way that foldl' is strict. Laziness is Haskell's outstanding virtue, but there are times when you want strict, and avoiding bulding up potentially large stacks of thunks awaiting execution for a fold is one of those times.
With that one change, the profile files say the runtime drops from 0.23 s to 0.20 s. The new cost centers:
COST CENTRE MODULE %time %alloc
fromList.(...) SemilatticeSearchTree 18.4 20.6
nDigitYs Main 15.9 19.5
digitsIn.digitsIn' Main 10.0 9.1
justOnes Main 8.5 12.3
justOnes.pair Main 4.5 0.0
floorSqrt.floorSqrt'.y Main 4.0 4.3
fromList SemilatticeSearchTree 4.0 0.5
process Main 3.5 8.4
choices Main 3.5 6.5
findFirst SemilatticeSearchTree 3.0 0.0
digitsIn Main 2.5 0.2
satisfy SemilatticeSearchTree 2.5 0.0
meet SemilatticeSearchTree 2.5 2.7
justOnes.innards Main 2.0 1.1
floorSqrt Main 2.0 0.6
main Main 1.5 0.5
dDigitYTree Main 1.5 0.0
yTree Main 1.5 3.2
meet SemilatticeSearchTree 1.5 0.0
mkBranch SemilatticeSearchTree 1.5 2.4
noTwos Main 0.5 2.2
choose Main 0.0 2.2
Peeking at the call stacks shows that by far the majority of calls to digitsIn.digitsIn' take place in floorSqrt, where we're counting bits and the values passed range up to a googol, which means we're talking up to maybe 330-bit numbers (100 / .30103). Going eight at a time helped, but apparently not enough if we're now taking ten percent of our time counting bits.
No comments:
Post a Comment