Sunday, June 09, 2013

Overkill (not really...)

So I thought that

ysInDigitRange d1 d2 = sum [numNDigitYs d | d <- [d1..d2]]

did some needless recalculation, and tried

memoNumNDigitYs = map numNDigitYs [1..]
nDigitYsPartialSums = scanl (+) 0 memoNumNDigitYs

ysInDigitRange d1 d2 = nDigitYsPartialSums !! d2 - nDigitYsPartialSums !! (d1 - 1)


scanl? Ah, that's like foldl, but instead of giving you the final result of the fold, it gives you a list: at the head is the "initial" value, followed by the results of the operation at each stage. nDigitYsPartialSums !! i is thus equal to

sum map numNDigitYs [1..i]

so that the two formulations of ysInDigitRange give the same result. Trying it, though, showed the scanl version taking a little more time, a little less total allocation--but actually allocating about nine megabytes! I didn't expect that. Just goes to show that not everything you think will be an optimization actually is.

UPDATE: OK, now I'm puzzled. The profiler output shows ysInDigitRange being invoked 325 times. Ditto for numNDigitYs, but... but... the only calls to numNDigitYs should be where values in memoNumNDigitYs are calculated, and we shouldn't ask for more than 50 of those, so what's with the other 275 invocations?

UPDATE: Comparing profiler output showed what I was overlooking. Before this change, there were a lot more calls to numNDigitYs. We're talking 2,732 + 325 instead of just 50 + 325, down by a factor of more than eight. (OK, that's where the 50 went, but where do the 325 come from?)

Moreover,  the time percentage of the top cost centers falls off a lot quicker with the change. In the past, that's tended to happen along with a decrease in execution time, but here it's gone up a bit... and why on earth does keeping around what should be just an extra 101 Integers cause such a jump in memory usage?! Time to generate some of those other memory graphs to see whether we can figure it out.

D'Oh! I forgot... I'd switched to -fllvm. LLVM is the abstract machine used as an intermediate form by clang, and if you have it installed, you can ask ghc to use it. That, or rather not doing that, made the difference in memory usage and time. We're back to a bit over 6 MB, and with profile output claiming .19 seconds. Best time output:

real    0m0.227s
user    0m0.184s
sys     0m0.040s


Worst time output:

real    0m0.233s
user    0m0.220s
sys     0m0.008s


I'm amazed that llvm makes that much difference. (Compilation takes a bit longer, but darn, it's worth it.)

No comments:

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...