We have yet to peek further into the profiling output; it's overdue.
The "cost center" tells you the total time used in particular functions, and that helps find slow functions--but while that's a sufficient condition for being a cost center, it's not a necessary condition. It may just be a function that someone's calling a lot.
Perhaps that's the case with tenToThe. The call tree shows calls in context, and when we look at it we see the following (edited slightly due to line length):
noTwos Main 184 43 0.8 2.0 31.9 28.6
tenToThe Main 229 21 0.0 0.0 0.0 0.0
noTwos.base Main 186 42 0.0 0.0 31.2 26.6
justOnes Main 187 42 8.8 13.3 31.2 26.6
justOnes.pair Main 195 85582 7.3 6.4 17.3 6.4
tenToThe Main 196 171164 10.0 0.0 10.0 0.0
justOnes.shell Main 191 42 0.0 0.0 0.0 0.0
tenToThe Main 192 42 0.0 0.0 0.0 0.0
justOnes.halfN Main 190 42 0.0 0.0 0.0 0.0
justOnes.innards Main 188 42 0.4 1.0 5.0 7.0
choices Main 189 60804 4.6 5.9 4.6 5.9
noTwos.halfN Main 185 42 0.0 0.0 0.0 0.0
pair is the hot spot for tenToThe calls. Let's see what we can do about pair.
Originally:
justOnes :: Int -> Int -> [Integer]
justOnes n o =
let halfN = n `div` 2
shell = tenToThe (n - 1) + 1
innards = concat [(halfN - 1) `choices` k | k <- [0..o]]
pair i = tenToThe i + tenToThe (n - (i + 1))
in [shell + sum (map pair c) | c <- innards]
First time I ever used "innards" in a program! Let's memoize pair (are you starting to notice a theme here?), giving
ascending = take (halfN - 1) (drop 1 powersOfTen)
descending = take (halfN - 1) (reverse (take (n - 1) powersOfTen))
memoPair = zipWith (+) ascending descending
pair i = memoPair !! (i - 1)
zipWith generalizes zip; in fact, you could write zip as
zipWith (\x y -> (x, y))
Results? Best observed time output:
real 0m0.234s
user 0m0.216s
sys 0m0.016s
Worst observed time output:
real 0m0.242s
user 0m0.228s
sys 0m0.012s
And the cost centers:
COST CENTRE MODULE %time %alloc
nDigitYs Main 21.8 19.1
fromList.(...) SemilatticeSearchTree 11.4 20.2
justOnes Main 10.0 14.2
digitsIn.digitsIn' Main 9.5 8.9
fromList SemilatticeSearchTree 4.7 0.5
justOnes.pair Main 4.3 0.0
choices Main 4.3 6.4
process Main 3.8 8.3
floorSqrt.floorSqrt'.y Main 3.8 4.2
digitsIn Main 2.4 0.2
satisfy SemilatticeSearchTree 2.4 0.6
mkBranch SemilatticeSearchTree 2.4 2.4
choose Main 1.9 2.2
meet SemilatticeSearchTree 1.9 0.0
satisfy SemilatticeSearchTree 1.9 0.0
findFirst SemilatticeSearchTree 1.9 0.0
numNDigitYs.numNoTwos.n' Main 1.4 0.9
noTwos Main 0.9 2.1
meet SemilatticeSearchTree 0.9 2.6
yTree Main 0.5 3.1
justOnes.innards Main 0.0 1.1
tenToThe is out of the running, and other things are moving up in percentage of time. Even with the new list, the total allocation is down from about 165 MB to 155 MB.
It's important to be guided by data. Looking at another part of the code,
noTwos :: Int -> [Integer]
noTwos n
| n == 1 = [1,3]
| even n = base
| otherwise = concat [[p, p + tenToThe halfN] | p <- base]
where halfN = n `div` 2
base = justOnes n (min 3 (halfN - 1))
The otherwise (actually odd n) case seems like it's doing a bunch of work, creating a bunch of two-item lists and concatenating them... but changing it to
| otherwise = base ++ [p + tenToThe halfN | p <- base]
turrned out to be slower.
I would say that this is the first optimization I've done that is arguably less than clear. (OK, maybe the switch from String to ByteString, but I would disagree.)
random notes and thoughts, mostly about Haskell these days, of a rather past middle-aged programmer
Saturday, June 01, 2013
Thursday, May 30, 2013
Up to date... or not
Well... I have ghc 7.4.1. It may be that more recent versions of ghc do a better job of optimization; they're up to 7.6.something.
Bodhi Linux is based on LTS releases of Ubuntu, and it doesn't look like the 12.04 repositories have been keeping up. I'd hate to have to switch distributions to get the latest ghc, but it may come to that.
UPDATE: BWAHAHAHA! I thought that the more recent version was better because it was allocating half as much memory. It was, instead, that I had a 32-bit version in one case, 64-bit in the other. Now I get to decide whether to plow onward with Ubuntu 13.04, which I installed to get ghc 7.6.something but which is flaky in some respects, or drop back to Bodhi Linux.
Bodhi Linux is based on LTS releases of Ubuntu, and it doesn't look like the 12.04 repositories have been keeping up. I'd hate to have to switch distributions to get the latest ghc, but it may come to that.
UPDATE: BWAHAHAHA! I thought that the more recent version was better because it was allocating half as much memory. It was, instead, that I had a 32-bit version in one case, 64-bit in the other. Now I get to decide whether to plow onward with Ubuntu 13.04, which I installed to get ghc 7.6.something but which is flaky in some respects, or drop back to Bodhi Linux.
Wednesday, May 29, 2013
String: elegant but deadly
In Haskell, String is a synonym for [Char], i.e. a list of Char. This is a very elegant way to deal with strings--but it's a killer for performance.
The recommended way around this is to use the ByteString type. It's not a drop-in replacement for String, sad to say; in particular replacing read and show are problematic. I lucked out on reading Integers, though the returned type is different, and I wimped out on generating the output. You also get to add an import line, in this case
import Data.ByteString.Char8 as B
and qualify some function names.Here's the revised main and supporting functions:
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
process :: B.ByteString -> Int
process line = case map B.readInteger (B.split ' ' line) of
[Just (m, _), Just (n, _)] -> numXs m n
_ -> -1
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
Is it worth it? Indeed it is. time output:
real 0m0.271s
user 0m0.240s
sys 0m0.028s
The cost centers:
COST CENTRE MODULE %time %alloc
nDigitYs Main 16.6 17.9
fromList.(...) SemilatticeSearchTree 13.0 18.9
tenToThe Main 12.6 0.0
justOnes Main 8.3 13.3
digitsIn.digitsIn' Main 7.9 8.4
justOnes.pair Main 6.7 6.4
fromList SemilatticeSearchTree 4.0 0.4
process Main 3.6 7.8
satisfy SemilatticeSearchTree 3.6 0.0
meet SemilatticeSearchTree 3.2 0.0
yTree Main 2.8 2.9
choices Main 2.8 6.0
floorSqrt.floorSqrt'.y Main 2.8 4.0
numNDigitYs Main 1.2 0.1
choose Main 1.2 2.0
noTwos Main 1.2 2.0
digitsIn Main 1.2 0.2
meet SemilatticeSearchTree 1.2 2.5
bound SemilatticeSearchTree 1.2 0.0
findFirst SemilatticeSearchTree 1.2 0.0
mkBranch SemilatticeSearchTree 0.4 2.2
justOnes.innards Main 0.0 1.0
main and main.r are nowhere to be seen. process is down from 35.9% time, 41.7% allocation, to 3.6% and 7.8% respectively.
Total allocation: 165,375,112 bytes, less than half of what it was. Not shabby.
P.S. Just before this change, the input file with much smaller values but ten times as many lines took well over half the time of the input file with huge values. That's no longer the case:
real 0m0.083s
user 0m0.076s
sys 0m0.004s
P.P.S. process is a bit brittle; if someone left more than one space between the two values on a line, B.split will obligingly represent the stuff between them with an empty string. We really should filter those out. UPDATE: turns out the ByteString has words; we're safe.
The recommended way around this is to use the ByteString type. It's not a drop-in replacement for String, sad to say; in particular replacing read and show are problematic. I lucked out on reading Integers, though the returned type is different, and I wimped out on generating the output. You also get to add an import line, in this case
import Data.ByteString.Char8 as B
and qualify some function names.Here's the revised main and supporting functions:
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
process :: B.ByteString -> Int
process line = case map B.readInteger (B.split ' ' line) of
[Just (m, _), Just (n, _)] -> numXs m n
_ -> -1
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
Is it worth it? Indeed it is. time output:
real 0m0.271s
user 0m0.240s
sys 0m0.028s
The cost centers:
COST CENTRE MODULE %time %alloc
nDigitYs Main 16.6 17.9
fromList.(...) SemilatticeSearchTree 13.0 18.9
tenToThe Main 12.6 0.0
justOnes Main 8.3 13.3
digitsIn.digitsIn' Main 7.9 8.4
justOnes.pair Main 6.7 6.4
fromList SemilatticeSearchTree 4.0 0.4
process Main 3.6 7.8
satisfy SemilatticeSearchTree 3.6 0.0
meet SemilatticeSearchTree 3.2 0.0
yTree Main 2.8 2.9
choices Main 2.8 6.0
floorSqrt.floorSqrt'.y Main 2.8 4.0
numNDigitYs Main 1.2 0.1
choose Main 1.2 2.0
noTwos Main 1.2 2.0
digitsIn Main 1.2 0.2
meet SemilatticeSearchTree 1.2 2.5
bound SemilatticeSearchTree 1.2 0.0
findFirst SemilatticeSearchTree 1.2 0.0
mkBranch SemilatticeSearchTree 0.4 2.2
justOnes.innards Main 0.0 1.0
main and main.r are nowhere to be seen. process is down from 35.9% time, 41.7% allocation, to 3.6% and 7.8% respectively.
Total allocation: 165,375,112 bytes, less than half of what it was. Not shabby.
P.S. Just before this change, the input file with much smaller values but ten times as many lines took well over half the time of the input file with huge values. That's no longer the case:
real 0m0.083s
user 0m0.076s
sys 0m0.004s
P.P.S. process is a bit brittle; if someone left more than one space between the two values on a line, B.split will obligingly represent the stuff between them with an empty string. We really should filter those out. UPDATE: turns out the ByteString has words; we're safe.
Tuesday, May 28, 2013
Things that make you go "Hmmm..."
OK. I rolled my own main, with serious help from "Haskell I/O for Imperative Programmers"--and I hasten to add that if you, like me, are learning Haskell, run, do not walk, to this page and read it.
Doing I/O, at least of the sort that programs like this do, in Haskell requires an attitude adjustment if you want to work with Haskell rather than against it. If you're a shell scripter, you already have the idea: you've worked with pipes. Pipes are like function composition; see
So, here's my main. I toyed with using interact, but having to skip the first line (the one that tells you how many lines are left) made it problematic, as well as the issue of adding a "case number" for getting the output in the form that the problem statement requires.
-- A new main, heavily inspired by "Haskell I/O for Imperative Programmers"
-- http://www.haskell.org/haskellwiki/Haskell_IO_for_Imperative_Programmers
-- (What's with that tail, you ask? We're skipping that first line that says
-- how many more lines there are. They could have been persnickety and padded
-- the file with additional lines to make sure you have to pay attention to
-- that first line, but they weren't.
main = do
s <- getContents
let r = map format $ zip [1..] (map process (tail $ lines s))
putStr (unlines r)
process :: String -> Int
process line = let [m, n] = map read (words line)
in numXs m n
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
So, it compiled, it ran, it gave the same results as before. What about the time and the cost centers? The time seems to be a little bit slower, though there's some overlap between it and the previous version. This version seems to take somewhere between 0.38 and 0.4 seconds. This despite the following profiler output:
COST CENTRE MODULE %time %alloc
process.(...) Main 35.9 41.7
nDigitYs Main 11.1 8.4
tenToThe Main 8.1 0.0
fromList.(...) SemilatticeSearchTree 7.6 8.9
digitsIn.digitsIn' Main 5.5 3.9
justOnes.pair Main 5.1 3.0
main.r Main 4.1 6.0
justOnes Main 3.9 6.2
main Main 3.2 9.5
fromList SemilatticeSearchTree 1.8 0.2
floorSqrt.floorSqrt'.y Main 1.4 1.8
meet SemilatticeSearchTree 1.4 0.0
choose Main 1.2 1.0
choices Main 1.2 2.8
mkBranch SemilatticeSearchTree 0.7 1.0
yTree Main 0.2 1.4
meet SemilatticeSearchTree 0.2 1.1
Note that process only chews up 35.9% of the time, 41.7% of the allocation, compared with 39.5% and 56.0% respectively for solve under the former regime. main and its immediate helpers are all that changed. How could execution time increase if the only part that changed decreased the percentage of execution time it took? It is, as someone once said, a puzzlement. I think I like this main, though, and will see if there's anything I can do to improve it.
UPDATE: Aha! Note that we've added a couple of cost centers: main.r and main. They collectively add up to 43.2% of time and 57.2% of allocation. That's how the result can take longer despite the distribution being flatter.
Doing I/O, at least of the sort that programs like this do, in Haskell requires an attitude adjustment if you want to work with Haskell rather than against it. If you're a shell scripter, you already have the idea: you've worked with pipes. Pipes are like function composition; see
So, here's my main. I toyed with using interact, but having to skip the first line (the one that tells you how many lines are left) made it problematic, as well as the issue of adding a "case number" for getting the output in the form that the problem statement requires.
-- A new main, heavily inspired by "Haskell I/O for Imperative Programmers"
-- http://www.haskell.org/haskellwiki/Haskell_IO_for_Imperative_Programmers
-- (What's with that tail, you ask? We're skipping that first line that says
-- how many more lines there are. They could have been persnickety and padded
-- the file with additional lines to make sure you have to pay attention to
-- that first line, but they weren't.
main = do
s <- getContents
let r = map format $ zip [1..] (map process (tail $ lines s))
putStr (unlines r)
process :: String -> Int
process line = let [m, n] = map read (words line)
in numXs m n
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
So, it compiled, it ran, it gave the same results as before. What about the time and the cost centers? The time seems to be a little bit slower, though there's some overlap between it and the previous version. This version seems to take somewhere between 0.38 and 0.4 seconds. This despite the following profiler output:
COST CENTRE MODULE %time %alloc
process.(...) Main 35.9 41.7
nDigitYs Main 11.1 8.4
tenToThe Main 8.1 0.0
fromList.(...) SemilatticeSearchTree 7.6 8.9
digitsIn.digitsIn' Main 5.5 3.9
justOnes.pair Main 5.1 3.0
main.r Main 4.1 6.0
justOnes Main 3.9 6.2
main Main 3.2 9.5
fromList SemilatticeSearchTree 1.8 0.2
floorSqrt.floorSqrt'.y Main 1.4 1.8
meet SemilatticeSearchTree 1.4 0.0
choose Main 1.2 1.0
choices Main 1.2 2.8
mkBranch SemilatticeSearchTree 0.7 1.0
yTree Main 0.2 1.4
meet SemilatticeSearchTree 0.2 1.1
Note that process only chews up 35.9% of the time, 41.7% of the allocation, compared with 39.5% and 56.0% respectively for solve under the former regime. main and its immediate helpers are all that changed. How could execution time increase if the only part that changed decreased the percentage of execution time it took? It is, as someone once said, a puzzlement. I think I like this main, though, and will see if there's anything I can do to improve it.
UPDATE: Aha! Note that we've added a couple of cost centers: main.r and main. They collectively add up to 43.2% of time and 57.2% of allocation. That's how the result can take longer despite the distribution being flatter.
Monday, May 27, 2013
Talk about TMTOWTDI...
At least from my newbie POV, the variety of ways people set up the top levels of their "Fair and Square" programs is pretty amazing.
Some use the standard issue recursive versions of top-level loops you see in texts; some use replicateM. Some ignore the first line, some don't.
I have some reading to do.
Some use the standard issue recursive versions of top-level loops you see in texts; some use replicateM. Some ignore the first line, some don't.
I have some reading to do.
The remaining time sinks
Counting Digits
We unrolled digitsIn' before; perhaps we could use another shot. Add another couple of lines for base^8 and the cost centers change:COST CENTRE MODULE %time %alloc
solve Main 39.5 56.0
nDigitYs Main 11.2 8.6
fromList.(...) SemilatticeSearchTree 8.2 9.1
tenToThe Main 7.9 0.0
justOnes.pair Main 4.2 3.1
justOnes Main 3.7 6.4
digitsIn.digitsIn' Main 3.0 4.1
floorSqrt.floorSqrt'.y Main 2.8 1.9
meet SemilatticeSearchTree 2.3 0.0
satisfy SemilatticeSearchTree 2.1 0.0
fromList SemilatticeSearchTree 1.9 0.2
digitsIn Main 1.6 0.1
findFirst SemilatticeSearchTree 1.2 0.0
choices Main 0.7 2.9
mkBranch SemilatticeSearchTree 0.7 1.1
yTree Main 0.5 1.4
meet SemilatticeSearchTree 0.5 1.2
Not shabby. It's down from 11.5% of the time to 3%, and 7.6% of allocations to 4.1%. If we were hard core, we'd go ahead and do specific base 10 and base 2 versions and avoid recalculating the powers of the base, but there are bigger fish to fry, especially the elephant in the room (metaphor blender set to purÊe!), solve.
I/O Bound?
So here's that outer layer, courtesy of jagd:solve i num = do
[a, b] <- getLine >>= return . map read . words
putStrLn $ "Case #" ++ show i ++ ": " ++ (show $ numXs a b)
if i < num
then solve (i+1) num
else return ()
main = do
num <- getLine >>= return.read
solve 1 num
The input is reminiscent of the old days of FORTRAN 66 where, rather than trying to detect end of file, you'd start your input card deck with a card that told your code how many more cards there were to read.
We've moved from generating and testing a huge list of palindromes to the point that the above code takes two fifths of the CPU time and does nearly three fifths of the trips to the heap. What is this doing to take so much time with an input of just 1,001 lines comprising a little over 160K of data?
Once I thought it was the repeated walks to concatenate up a String to feed to putStrLn, so I broke the one putStrLn down into three putStr calls and a putStrLn, but it made no difference that I could see, so I suspect it's the input.
I've heard that one issue is the use of String, a synonym for [Char], and that there's a ByteString type that carries less overhead... but that doesn't suggest anything, except maybe trying to ask for the flavor of getLine that hands back a ByteString. I'll investigate that, and also take a look at how some of the other posted solutions read and parse the input, to see what I can learn. It bugs the heck out of me to think that I'm spending that much time and taking so many trips to the well to read such a dinky file.
UPDATE: Out of idle curiosity, I ran the first large input file, 10,000 ranges bounded above by 1014. time output:
real 0m0.300s
user 0m0.288s
sys 0m0.008s
The profiler output shows solve taking not quite three fourths of the time and doing 90% of the allocation! There's got to be a better way.
Qapla'!
I finally got things in sync with van Laarhoven's module, and the output is correct. (Just to make sure, I went back to the Code Jam site and submitted the output. I think that being up until nearly 3:00 a.m. caused my initial reaction to
I'm a little bummed at using someone else's code for a nontrivial portion of the problem, but OTOH, I did learn more Haskell, which is one of my real goals.
But back to the results: the best time output is
real 0m0.361s
user 0m0.336s
sys 0m0.024s
and the worst is
real 0m0.377s
user 0m0.364s
sys 0m0.008s
Both are less than the 0.4 seconds I saw for the C++ solution I tried.
[We pause here to do a victory dance.]
Where are our cost centers, and how about that memory usage?
COST CENTRE MODULE %time %alloc
solve Main 37.7 54.0
digitsIn.digitsIn' Main 11.5 7.6
nDigitYs Main 9.5 8.3
fromList.(...) SemilatticeSearchTree 6.8 8.8
tenToThe Main 6.2 0.0
justOnes Main 4.6 6.2
justOnes.pair Main 3.3 3.0
satisfy SemilatticeSearchTree 2.0 0.0
choices Main 1.8 2.8
floorSqrt.floorSqrt'.y Main 1.8 1.8
fromList SemilatticeSearchTree 1.8 0.2
yTree Main 1.5 1.4
floorSqrt Main 1.1 0.3
digitsIn Main 1.1 0.1
meet SemilatticeSearchTree 1.1 1.1
mkBranch SemilatticeSearchTree 1.1 1.0
findFirst SemilatticeSearchTree 1.1 0.0
Aside from solve and digitsIn.digitsIn', the time distribution is a lot flatter. Now it's even more astonishing just how much time and allocation solve, which just reads the file, calls the function, and prints out the results, takes! Any serious improvement is going to have to come from there.
We are definitely trading space for time. Previous graphs of memory usage peaked at around four megabytes, but this version just about doubles that.
UPDATE: ...and you know, we did tack those ordinal positions on with each Y value and added the tree structure and the upper bound info, so doubling sounds plausible.
to be "Oh, [expletive]! The judge says I need to correct my last submission!" Then I realized it was an adjective, not a command, and relaxed.)
I'm a little bummed at using someone else's code for a nontrivial portion of the problem, but OTOH, I did learn more Haskell, which is one of my real goals.
But back to the results: the best time output is
real 0m0.361s
user 0m0.336s
sys 0m0.024s
and the worst is
real 0m0.377s
user 0m0.364s
sys 0m0.008s
Both are less than the 0.4 seconds I saw for the C++ solution I tried.
[We pause here to do a victory dance.]
Where are our cost centers, and how about that memory usage?
COST CENTRE MODULE %time %alloc
solve Main 37.7 54.0
digitsIn.digitsIn' Main 11.5 7.6
nDigitYs Main 9.5 8.3
fromList.(...) SemilatticeSearchTree 6.8 8.8
tenToThe Main 6.2 0.0
justOnes Main 4.6 6.2
justOnes.pair Main 3.3 3.0
satisfy SemilatticeSearchTree 2.0 0.0
choices Main 1.8 2.8
floorSqrt.floorSqrt'.y Main 1.8 1.8
fromList SemilatticeSearchTree 1.8 0.2
yTree Main 1.5 1.4
floorSqrt Main 1.1 0.3
digitsIn Main 1.1 0.1
meet SemilatticeSearchTree 1.1 1.1
mkBranch SemilatticeSearchTree 1.1 1.0
findFirst SemilatticeSearchTree 1.1 0.0
Aside from solve and digitsIn.digitsIn', the time distribution is a lot flatter. Now it's even more astonishing just how much time and allocation solve, which just reads the file, calls the function, and prints out the results, takes! Any serious improvement is going to have to come from there.
We are definitely trading space for time. Previous graphs of memory usage peaked at around four megabytes, but this version just about doubles that.
UPDATE: ...and you know, we did tack those ordinal positions on with each Y value and added the tree structure and the upper bound info, so doubling sounds plausible.
Sunday, May 26, 2013
Almost there...
I have something that compiles... but doesn't give the correct result. I suspect I've blown the interface to van Laarhoven's SemilatticeSearchTree module in a way that, while it passes syntactic muster, is keeping the search from working right. The presence of a bunch of zeroes in the output might indicate that I'm consistently getting back Nothing.
What does the time output (0.3240.347s real time) tell us about what we might see when it's correct? Well, it's certainly a lower bound--but how much longer will the correct version run? The buggy one has presumably built the trees, and is doing all the work, but if the maxima are taking it immediately to Nothing, we're only skipping a couple of O(log(n)) operations times a thousand input lines, where n is no bigger than a bit over 4,000 (the number of 49-digit Ys). I'm hoping the corrected version won't take all that much longer... but we can't know until we have it right. More news as it happens.
What does the time output (
Subscribe to:
Posts (Atom)
Riddler Classic, May 23, 2020—Holy Mackerel!
Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...
-
Back in the Cretaceous era I worked at the University of Oklahoma as a student assistant at Remote 1. OU was a [shudder] IBM big iron shop a...
-
You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...
-
Verbal Abuse as Entertainment When I grew up, my parents always told me that there was a sort of person who needed to tear down others t...