What next?

Yesterday it occurred to me that the profiling output might be more accurate if I just asked for -p, rather than always asking for the heap tracking. The first part of the profile output:

    Mon Jul  1 22:23 2013 Time and Allocation Profiling Report  (Final)

       ultimatepalindrome20 +RTS -p -RTS

    total time  =        0.17 secs   (167 ticks @ 1000 us, 1 processor)
    total alloc = 116,700,576 bytes  (excludes profiling overheads)

COST CENTRE            MODULE                %time %alloc

fromList.(...)         SemilatticeSearchTree  19.2   26.8
ysPair.noTwos          Main                   11.4    8.9
floorSqrt.floorSqrt'.y Main                   10.2    5.6
choices                Main                    7.8   10.7
process                Main                    7.2   11.0
fromList               SemilatticeSearchTree   6.6    0.6
ysPair.spread.(...)    Main                    4.8    2.4
ysPair.noTwos'         Main                    4.2    4.2
meet                   SemilatticeSearchTree   3.6    0.0
satisfy                SemilatticeSearchTree   3.0    0.0
main                   Main                    1.8    1.0
numYs                  Main                    1.8    0.1
ysPair.pairSum         Main                    1.8    3.5
makeYTree              Main                    1.8    4.1
bDigits                Main                    1.8    0.0
meet                   SemilatticeSearchTree   1.8    3.5
mkBranch               SemilatticeSearchTree   1.8    3.2
ysPair.noTwosChoices   Main                    1.2    0.7
bDigits.bDigits'       Main                    1.2    1.9
bound                  SemilatticeSearchTree   1.2    0.0
ysPair.noTwos'.base    Main                    0.6    1.2
ysPair.spread          Main                    0.6    2.5
ysPair.twoNPlus1List   Main                    0.0    2.9

The cost of ysPair is spread out over several lines; add them up and you get 32.4% of the time (choices count; there's only one caller) and 37% of the allocation... but then, the rest of the profile does the adding for us; let's check it out. OK, I was pretty close. fromList with its descendants takes up about another third of the time and allocation.

floorSqrt and ceilSqrt should be low-hanging fruit. As we've said before, it converges quadratically, doubling the number of valid bits each time around... but for numbers of up to 330 bits, that's as many as nine iterations starting from just one good bit as floorSqrt does. So, the temptation is to use the double precision square root directly where possible, and use it as a first approximation otherwise. With 52 good bits as a starting point, three iterations will do the job for values up to a googol.

On the other hand, coming up with some way to use the semilattice search trees to just search on one value (and only store the maximum value searched for as the meet!), with the rest of the node along for the ride, would save time and memory and be more generally applicable, and would help me with Haskell in general. That's the way to go.


Popular posts from this blog

a longest path problem

No tutorial, I swear...

Bikeshedding leap years