So I thought that
ysInDigitRange d1 d2 = sum [numNDigitYs d | d <- [d1..d2]]
did some needless recalculation, and tried
memoNumNDigitYs = map numNDigitYs [1..]
nDigitYsPartialSums = scanl (+) 0 memoNumNDigitYs
ysInDigitRange d1 d2 = nDigitYsPartialSums !! d2 - nDigitYsPartialSums !! (d1 - 1)
scanl? Ah, that's like foldl, but instead of giving you the final result of the fold, it gives you a list: at the head is the "initial" value, followed by the results of the operation at each stage. nDigitYsPartialSums !! i is thus equal to
sum map numNDigitYs [1..i]
so that the two formulations of ysInDigitRange give the same result. Trying it, though, showed the scanl version taking a little more time, a little less total allocation--but actually allocating about nine megabytes! I didn't expect that. Just goes to show that not everything you think will be an optimization actually is.
UPDATE: OK, now I'm puzzled. The profiler output shows ysInDigitRange being invoked 325 times. Ditto for numNDigitYs, but... but... the only calls to numNDigitYs should be where values in memoNumNDigitYs are calculated, and we shouldn't ask for more than 50 of those, so what's with the other 275 invocations?
UPDATE: Comparing profiler output showed what I was overlooking. Before this change, there were a lot more calls to numNDigitYs. We're talking 2,732 + 325 instead of just 50 + 325, down by a factor of more than eight. (OK, that's where the 50 went, but where do the 325 come from?)
Moreover, the time percentage of the top cost centers falls off a lot quicker with the change. In the past, that's tended to happen along with a decrease in execution time, but here it's gone up a bit... and why on earth does keeping around what should be just an extra 101 Integers cause such a jump in memory usage?! Time to generate some of those other memory graphs to see whether we can figure it out.
D'Oh! I forgot... I'd switched to -fllvm. LLVM is the abstract machine used as an intermediate form by clang, and if you have it installed, you can ask ghc to use it. That, or rather not doing that, made the difference in memory usage and time. We're back to a bit over 6 MB, and with profile output claiming .19 seconds. Best time output:
real 0m0.227s
user 0m0.184s
sys 0m0.040s
Worst time output:
real 0m0.233s
user 0m0.220s
sys 0m0.008s
I'm amazed that llvm makes that much difference. (Compilation takes a bit longer, but darn, it's worth it.)
random notes and thoughts, mostly about Haskell these days, of a rather past middle-aged programmer
Subscribe to:
Post Comments (Atom)
Riddler Classic, May 23, 2020—Holy Mackerel!
Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...
-
Back in the Cretaceous era I worked at the University of Oklahoma as a student assistant at Remote 1. OU was a [shudder] IBM big iron shop a...
-
You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...
-
So, how about that tree? It's tempting to do something like van Laarhoven describes, though I'd be tempted to put maxima on the left...
No comments:
Post a Comment