Wednesday, May 29, 2013

String: elegant but deadly

In Haskell, String is a synonym for [Char], i.e. a list of Char. This is a very elegant way to deal with strings--but it's a killer for performance.

The recommended way around this is to use the ByteString type. It's not a drop-in replacement for String, sad to say; in particular replacing read and show are problematic. I lucked out on reading Integers, though the returned type is different, and I wimped out on generating the output. You also get to add an import line, in this case

import Data.ByteString.Char8 as B

and qualify some function names.Here's the revised main and supporting functions:

main = do
    s <- B.getContents
    let r = map format $ zip [1..] (map process (tail $ B.lines s))
    putStr (unlines r)

process :: B.ByteString -> Int

process line = case map B.readInteger (B.split ' ' line) of
               [Just (m, _), Just (n, _)] -> numXs m n
               _                          -> -1

format :: (Int, Int) -> String

format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result


Is it worth it? Indeed it is. time output:

real    0m0.271s
user    0m0.240s
sys     0m0.028s


The cost centers:

COST CENTRE            MODULE                %time %alloc

nDigitYs               Main                   16.6   17.9
fromList.(...)         SemilatticeSearchTree  13.0   18.9
tenToThe               Main                   12.6    0.0
justOnes               Main                    8.3   13.3
digitsIn.digitsIn'     Main                    7.9    8.4
justOnes.pair          Main                    6.7    6.4
fromList               SemilatticeSearchTree   4.0    0.4
process                Main                    3.6    7.8
satisfy                SemilatticeSearchTree   3.6    0.0
meet                   SemilatticeSearchTree   3.2    0.0
yTree                  Main                    2.8    2.9
choices                Main                    2.8    6.0
floorSqrt.floorSqrt'.y Main                    2.8    4.0
numNDigitYs            Main                    1.2    0.1
choose                 Main                    1.2    2.0
noTwos                 Main                    1.2    2.0
digitsIn               Main                    1.2    0.2
meet                   SemilatticeSearchTree   1.2    2.5
bound                  SemilatticeSearchTree   1.2    0.0
findFirst              SemilatticeSearchTree   1.2    0.0
mkBranch               SemilatticeSearchTree   0.4    2.2
justOnes.innards       Main                    0.0    1.0


main and main.r are nowhere to be seen. process is down from 35.9% time, 41.7% allocation, to 3.6% and 7.8% respectively.

Total allocation: 165,375,112 bytes, less than half of what it was. Not shabby.

P.S. Just before this change, the input file with much smaller values but ten times as many lines took well over half the time of the input file with huge values. That's no longer the case:

real    0m0.083s
user    0m0.076s
sys     0m0.004s

 
P.P.S. process is a bit brittle; if someone left more than one space between the two values on a line, B.split will obligingly represent the stuff between them with an empty string. We really should filter those out. UPDATE: turns out the ByteString has words; we're safe.

No comments:

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...