Aging Nerd Notes: 2013

Sunday, December 29, 2013

Someone has done this before...

Happened across it while looking for Haskell-related blogs to link to: from Derek Wyatt's blog, a post titled "Haskell sequence over functions -- explained" gives this example:

sequence [(> 4), (< 10), odd] 7

gives the result we would get from our wonkyMap with the same arguments:

[True, True, True]

What is this sequence?

:t sequence
sequence :: Monad m => [m a] -> m [a]

If m is a monad, then sequence will map a [m a] to a m [a].

So, our functions must have the type m a for some m and a. Let's inquire with ghci once again.

:t sequence [(> 4), (< 10), odd]
sequence [(> 4), (< 10), odd] :: Integral a => a -> [Bool]

Aha. That reminds me of the exercise in Typeclassopedia: show that (->) r is a functor. (WTF? That's the type of functions that take an argument of type r.) Turns out that not only is it a functor, it's a monad; Typeclassopedia refers to it as the "reader monad".

Unfortunately, we have dueling definitions here. Mr. Wyatt's blog post uses the Learn You a Haskell for Great Good! definition, as opposed to what we see from ghci, though clearly the end result is the same. Let's carry on with the version that ghci uses, and you neither want nor need to see my long, rambling, not-yet-successful attempt to convince myself that (->) r really is a monad. We'll just suppose it is for now, and see what the consequences are.

Haskell's type inference takes the most general possible type. The functions we have here are all Integral a => a -> Bool. In fact, I suspect Integral b => b is r for this example, using b because the sequence declaration has already used a, and I don't need help getting more confused. For us, the a mentioned in sequence's declaration is Bool, and in our case, we have

sequence :: Integral b => [b -> Bool] -> b -> [Bool]

which is exactly what we wanted for our solution to the problem in that online Haskell course.

So, bye-bye wonkyMap; we can rewrite strong as

strong password = all $ sequence constraints password
where constraints = [(>= 15) . length,
any isUpper,
any isLower,
any isNumber]

...and I definitely need to get my head around monads, if for no other reason than to come up with alternative Hoogle search strings. :)

Wednesday, December 25, 2013

Only a few years late...

So I was looking at a list of requirements, trying to make sense of them and make sure I would come up with something that would satisfy them, when it occurred to me that you should bypass the middleman. Requirements should be written in the form of tests that pass iff the software satisfies them.

Turns out I'm a few years late. Dan North has already come up with the notion of "behavior-driven development". There are now parsers for "ubiquitous languages" in which one writes descriptions of how a piece of software should behave if you do certain things, and from those descriptions one generates tests. One example: Cucumber. (Why Cucumber? I think it's from the convention of having output from tests that fail come out in red while tests that pass generate output in green--like a cucumber.)

Haskell has some very neat facilities for testing, e.g. QuickCheck and HUnit. Is there a behavior-driven development package for Haskell? Why yes, there is: HSpec. Sweet.

Wednesday, December 18, 2013

You'd think someone's done this before

Saw a link to an Introduction to Haskell course set up by a student at the University of Virginia (which inspired others), and saw one of the homework assignments: write a function

strong :: String -> Bool

which returns True iff the string passed to it is a "strong" password, which for purposes of the assignment means

it's at least 15 characters long
it includes upper case letters
it includes lower case letters
it includes digits

Easy enough to write, especially if you import Data.Char, but the lesson is about higher-order functions, and you're urged to use the things taught in the lesson (among which are the character classification functions in Data.Char). So it occurred to me that it would be good to write the function in a way that makes it easy to add other constraints--perhaps it shouldn't be in some list of bad password choices, say.

So, we'd like to write it to take a list of String -> Bool, apply them to the String, and confirm that they all return True. The lesson teaches about all and about map, but we don't want map here.

map :: (a -> b) -> [a] -> [b]

but we want

wonkyMap :: [a -> b] -> a -> [b]

An obvious implementation is

wonkyMap [] _ = []

wonkyMap (f:fs) x = (f x) : (wonkyMap fs x)

though I'm sure it can be written more elegantly as a fold.

(UPDATE: Duh... list comprehensions are your friend.

wonkyMap fs x = [f x | f <- fs]

just as you could define

map f xs = [f x | x <- xs]

There's a pleasing symmetry there.)

Then you have

strong password = all $ wonkyMap constraints password

where constraints = [(> 15) . length,

any isUpper,

any isLower,

any isNumber]

OTOH, someone has to have come up with wonkyMap before--and given it a better name. Hoogling the signature, though, didn't turn up anything. Does it look familiar to anyone?

UPDATE: Oops... make that (>= 15) . length

Thursday, November 21, 2013

Project Euler #14 revisited

Recently I came a cross a link from... aargh, was it Reddit or Hacker News? I should have saved the link. In any case, it was from someone who had decided to try to optimize a solution to Project Euler #14 in Haskell, and it induced me to revisit the program I'd written for it. Euler #14 asks the musical question, which number less than a million takes the longest for its Collatz sequence to get to 1?

{-# LANGUAGE BangPatterns #-}

import Data.List
import Data.Ord

collatzChainLen :: Int -> Int
collatzChainLen n = collatzChainLen' n 1
    where collatzChainLen' n !l
            | n == 1    = l
            | otherwise = collatzChainLen' (collatz n) (l + 1)
          collatz n = if even n then n `div` 2 else 3 * n + 1

pairMap :: (a -> b) -> [a] -> [(a, b)]
pairMap f xs = [(x, f x) | x <- xs]

main :: IO ()
main = print $ fst (maximumBy (comparing snd) (pairMap collatzChainLen [1..999999]))

Compiled with ghc -O2, about 3.5 seconds on my 2.8 GHz Athlon 64.
Of course, the thing to do is memoize collatzChainLen; if you have a sequence {x₁, x₂, ..., x_n = 1, ...} you know the values for all the x_i, not just x₁.

But the first thing I did was say to myself, "Self, that collatzChainLen function is yet another example of grinding out yet another boilerplate recursive function rather than using the Prelude and getting in the groove of using what the language gives you, combining functisns as Mr. Hughes and Mr. Backus recommend. Surely we can be more Haskelly than that." (I swear I hadn't been watching Looney Tunes.)

The result:

collatzChainLen :: Int -> Int
collatzChainLen n = 1 + (length . takeWhile (/= 1) . collatzChain) n
    where collatzChain = iterate collatz
          collatz n = if even n then n `div` 2 else 3 * n + 1

We've made the acquaintance of iterate before; it generates as much of the endless chain as we wish to grab, which we wish to do until we find a 1 with takewhile (/= 1). length counts how many that is, and while it doesn't affect the comparison, we add 1 rather than fib about what we're calculating. One add shouldn't affect much (or should it? If we're not memoizing, we're doing it rather a lot).

Does that affect the time? It sure does, but not as we'd like. With that version of collatzChainLen, the program takes about 9.7 seconds to run, almost three times as long! What's the difference? "It's profilin' time", as Ben Grimm would say if he learned Haskell. The evidence is pretty clear:

first version: total alloc = 192,049,472 bytes
second version: total alloc = 21,277,532,952 bytes

Over a hundred times as much RAM, and who knows how much GC? I could believe that's the issue.

The lower-level version of collatzChainLen generates and counts values one at a time... but then, Haskell is lazy, so the higher-level one should, too--but perhaps something about creating a list of those values, or rather a couple, since takeWhile can't destructively chop off the list, but has to copy, is increasing the overhead. I recall that there's supposed to be something called "stream fusion" that gets rid of that overhead; let's look up how that's done.

Oops. haskell.org seems to be down, but apt-get install is my friend. Looks like there's a Data.Stream library, and you create streams instead of lists. Here goes:

import qualified Data.Stream as S

collatzChainLen :: Int -> Int
collatzChainLen n = 1 + (length . S.takeWhile (/= 1) . collatzChain) n
    where collatzChain = S.iterate collatz
          collatz n = if even n then n `div` 2 else 3 * n + 1

Time? Drat; still around 9.7 seconds. Doesn't seem very fused to me with that time, and a blog post about this very problem had example code that had

import qualified Data.List.Stream as S

and used S.length instead of just length. OK, rummage through synaptic for the libghc packages (sorry, but I've been burned by cabal), and...??? Darned if I can find it, and the blog post dates from August 2013, so it's not all that old.

While we're trying to figure out how to get to Data.List.Stream in Ubuntu, we can do one other thing:

collatz n = case n `divMod` 2 of
    (n', 0) -> n'
   _ -> 3 * n + 1

turns out to make a very respectable difference, cutting runtime down by 1.2 seconds! More news as it happens.

UPDATE: clearly

colllatzChainLen (2 * n) == 1 + collatzChainLen n

so, given a number < 500000, you could double it until it was in [500000, 999999] and have a number with a longer chain, so we can start with 500000.

Results: 4.6 seconds for the "Haskelly" version, 1.2 seconds for the explicitly written out collatzChainLen.

UPDATE: -fllvm is your friend; it took the runtimes down to 3.5 seconds for the higher-order function flavor... and 0.29 seconds for the version with the written-out collatzChainLen.

Saturday, November 09, 2013

Project Euler #78

If you have a pile of n coins, there's some number of distinct ways to divide them up into piles; let's refer to the function that maps a number of coins to the number of ways to divide them up into piles as p. The example they give is for five coins; as you can see, p 5 == 7.

OOOOO
OOOO O
OOO OO
OOO O O
OO OO O
OO O O O
O O O O O

Things to note:

the coins are all alike; it's not like you have different denominations that could distinguish one pile from another with the same number of coins.
order doesn't matter; OOO OO and OO OOO don't count as different.

One's first inclination is to say "If you create a pile of m coins, that leaves n - m, and there are p (n - m) ways to divide them up, right?" Not exactly; if you don't confine yourself to piles of m or fewer coins, you'll violate (2). So we need what we'll call p', where p' n m is the number of ways to divide up n coins into piles of size m or less. Then you have

p :: Int -> Int
p n = sum $ map (p' n) [1..n]

That leaves p'. There are some obvious base cases:

p :: Int -> Int -> Int
p' n m
    | m >= n - 1 = 1
    | m == 1     = 1

If not those, then it turns out to be

    | otherwise = sum [p' (n - m) i | i <- [1..m']]
                      where m' = min (n - m) m

That matches up with examples I did by hand, and in fact corrected some I managed to get wrong, so I am fairly sure it's correct.

Unfortunately, the full Euler Project problem asks for

head [n | n <- [1..], (p n) `mod` 1000000 == 0]

and this way of calculating p' and thence p is up there with the canonical recursive Fibonacci series function for slow. We need a faster way to evaluate p', or even better a fast way to calculate p without having to bother with p'.

So we generated some lines of output, the i^th line being map (p' i) [1..i], Here it is, left justified:

[1]
[1,1]
[1,1,1]
[1,2,1,1]
[1,2,2,1,1]
[1,3,3,2,1,1]
[1,3,4,3,2,1,1]
[1,4,5,5,3,2,1,1]
[1,4,7,6,5,3,2,1,1]
[1,5,8,9,7,5,3,2,1,1]
[1,5,10,11,10,7,5,3,2,1,1]
[1,6,12,15,13,11,7,5,3,2,1,1]
[1,6,14,18,18,14,11,7,5,3,2,1,1]
[1,7,16,23,23,20,15,11,7,5,3,2,1,1]
[1,7,19,27,30,26,21,15,11,7,5,3,2,1,1]
[1,8,21,34,37,35,28,22,15,11,7,5,3,2,1,1]
[1,8,24,39,47,44,38,29,22,15,11,7,5,3,2,1,1]
[1,9,27,47,57,58,49,40,30,22,15,11,7,5,3,2,1,1]
[1,9,30,54,70,71,65,52,41,30,22,15,11,7,5,3,2,1,1]
[1,10,33,64,84,90,82,70,54,42,30,22,15,11,7,5,3,2,1,1]
[1,10,37,72,101,110,105,89,73,55,42,30,22,15,11,7,5,3,2,1,1]
[1,11,40,84,119,136,131,116,94,75,56,42,30,22,15,11,7,5,3,2,1,1]

And here it is right justified:

[1]
[1,1]
[1,1,1]
[1,2,1,1]
[1,2,2,1,1]
[1,3,3,2,1,1]
[1,3,4,3,2,1,1]
[1,4,5,5,3,2,1,1]
[1,4,7,6,5,3,2,1,1]
[1,5,8,9,7,5,3,2,1,1]
[1,5,10,11,10,7,5,3,2,1,1]
[1,6,12,15,13,11,7,5,3,2,1,1]
[1,6,14,18,18,14,11,7,5,3,2,1,1]
[1,7,16,23,23,20,15,11,7,5,3,2,1,1]
[1,7,19,27,30,26,21,15,11,7,5,3,2,1,1]
[1,8,21,34,37,35,28,22,15,11,7,5,3,2,1,1]
[1,8,24,39,47,44,38,29,22,15,11,7,5,3,2,1,1]
[1,9,27,47,57,58,49,40,30,22,15,11,7,5,3,2,1,1]
[1,9,30,54,70,71,65,52,41,30,22,15,11,7,5,3,2,1,1]
[1,10,33,64,84,90,82,70,54,42,30,22,15,11,7,5,3,2,1,1]
[1,10,37,72,101,110,105,89,73,55,42,30,22,15,11,7,5,3,2,1,1]
[1,11,40,84,119,136,131,116,94,75,56,42,30,22,15,11,7,5,3,2,1,1]

We see some patterns.

p' n 2 == n `div` 2 for n > 1. There are n `div` 2 possible piles of size two, and each is either there or split into two piles of size one, so that makes sense.
Check out that trailing portion. There's a steadily growing hunk of the end that never changes. It's because the maximum possible pile of what's left is no longer constrained, so past a certain value of n, p' n (n - m) = p m.

Let's check that last one by looking at the initial hunk of map p [1..]:

[1,2,3,5,7,11,15,22,30,42,56,77,101,135,176,231,297,385,490,627]

So, it would seem that p n == p' (2 * n) (2 * n - n) == p' (2 * n) n. That gets rid of the sum at the p end, but adds to the work on the p' side; it's not any faster, and maybe even a little slower. Worse yet, this isn't like factorial, where the number of trailing zeroes never shrinks--we're liable to end up using Integer instead of Int for the values the problem wants, which will make things slower still.

What we're really doing is counting "integer partitions". The integer being partitioned is the total number of coins, and each pile is represented by the number of coins in it (all the coins are alike). I will have to study this further.

UPDATE: maybe we can generalize the argument we did for p' n 2. What can we say about p' n k? There are n `div` k possible piles of size k, all right, but the leftovers have more than one possible state, and it's not as easy to enumerate them so that the pile sizes stay monotonically non-increasing.

Monday, November 04, 2013

Data Structures

The conventional wisdom on data structures in functional languages is that immutability costs you. It was a big advance when Okasaki came up with algorithms that have an amortized complexity for lazy functional languages the same as the non-amortized complexity for the data structures with destructive update.

Work is going on in that area; check out Edward Kmett's post "Part I: Deamortized ST" about how to come up with a way to make the amortization unnecessary. (It's a bit reminiscent of Asimov's essay "Behind the Teacher's Back".) I'm hoping that will ultimately mean Haskell will be usable in still more situations.

(Actually, I should just say "read Edward Kmett's blog" period. It's all very good stuff.)

Saturday, November 02, 2013

Let ghc help you write your code--must {read, watch} from Matthew Brecknell

Suppose you had no idea what function composition should do, but you were given the type:

(.) :: (b -> c) -> (a -> b) -> a -> c

and had to define it:

(f . g) x = ???

We want a c. The only way we know to get a c is with f, which will give us a c if we give it a b. The only way we're given to get a b is with g, which will give us a b if we give it an a. Hey, x is an a!

(f . g) x = f (g x)

is thus the only way to honor the declaration.

Yes, this is a simple example... but not only can the method be applied more generally, you can get ghc to help you with error messages of the form "I expected a here but {you gave me something else, I can't infer that}". For details and a video working through examples, check out Matthew Brecknell's blog post "Hole-Driven Haskell".

Sunday, October 20, 2013

A new problem

OK, let's try a new problem this time: Project Euler #92. I happened across a YouTube video of a "live coding" session by Daniel Silverstone. It's worth your while. I admire the courage of someone willing to show such an unedited session, typos and all, and besides, the video shows just how expressive Haskell is.

So, having seen Mr. Silverstone's video, this is another case in which I can't pretend to having had all the insights myself. I will write the code, though; it's not copy and paste.

The problem has to do with a function that maps n to the sum of the squares of the digits of n expressed in base ten. I can't quite bring myself to use show, so...

sumSqDigits :: Int -> Int
sumSqDigits n = sumSqDigits' n 0
    where sumSqDigits' n !s
           | n < 10    = n * n + s
           | otherwise = let (q, r) = divMod n 10 in sumSqDigits' q (r * r + s)

Now, it can be shown that for non-zero n, iterate sumSqDigits n will "end" with one of

repeat 1
concat $ repeat [89, 145, 42, 20, 4, 16, 37, 58]

So, Project Euler problem #92 is the musical question "how many of the positive integers less than ten million are of the second kind?"

We'll borrow Mr. Silverstone's function name...

terminator :: Int -> Int
terminator n
| n == 1 || n == 89 = n
| otherwise = terminator $ sumSqDigits n

Well, then just type

length . filter (== 89) $ map terminator [1...9999999]

and you're done, right? Well, you are if you're willing to wait. It is, after all, evaluating sumSqDigits rather a lot. Mr. Silverstone uses the State monad to memoize terminator and speed things up quite a bit.

How best to do this? There are only two values, so one could argue that we're wasting space saving terminator values; we can recast the function as

terminatesWith89 :: Int -> Bool
terminatesWith89 n = n == 89 || terminatesWith89 (sumSqDigits n)

which in turn suggests a bitmap, since we have a bound on the values we're looking at. Ten million bits isn't all that much--at least not these days. But we can do better than that. After all, sumSqDigits takes an n-digit number to at most 81 * n. We're just interested in numbers of up to seven digits, so we need only memoize up to 7 * 81 = 567. Seventy-one bytes beats 1.2 megabytes any day.

For that matter, we can decide on how much space we're willing to devote to it, and just apply sumSqDigits until it's in range for our bitmap, which won't take too long. Say we start with n = a googol - 1. One hundred digits, so sumSqDigits n is at most 8100. That's four digits, so sumSqDgits of that is at most 243, and we're good to go.

With that it didn't take long to write up a 31-line (including "declarations" and blank lines) Haskell program that, when compiled with ghci -O2, solved the problem in a little over three seconds. We wrote a little helper function that followed the problem terminology:

chain :: Int -> [Int]
chain n = iterate sumSqDigits n

It proved nice to use along with dropWhile... but could we make it better? Well, we might have been able to speed up the generation of our bitmap, because everything on the same chain has the same terminator value, so we could have filled in a whole chain at once.

(Come to think of it, a chain that ends with 1, 1, 1, ... has to get there by way of a power of ten. Other values can get there, e.g. 86 (64 + 36 = 100); is there some way they can be characterized? Maybe there's a closed form solution.)

Monday, September 30, 2013

You call that a cleanup?

OK, over on Reddit they make a good point, echoing Simon Thompson in the intro to the second edition of Haskell: The Craft of Functional Programming. He mentions there emphasizing the higher-order functions of the Standard Prelude because (I'm paraphrasing here, and working from memory) otherwise students tend to stay at the lower level, churning out the same recursive schema over and over again... and I'm certainly People's Exhibit #1 of that with bdigits and lastLE. I mean, good grief.

So, let's review the main idea: given two Integer values n and b, where b > 1, we want to know how many digits it takes to write n in base b. The thing is that n may be big, so rather than the stock counting digits one at a time, comparing against b each time, we compare with the values of iterate square b, i.e. [b ^ (2 ^ i) | i <- [0..]]. Only a finite number will be less than or equal to n. If none are, then obviously one base b digit will do. Otherwise, divide by the last one, which will have the form b ^ (2 ^ j) for some j, add 2 ^ j to a running total of digits required, and do it again.

What drove us down the road we took is that one's first thought,

takeWhile (<= n)

has to copy a prefix of the list of powers of b, and we really only care about the last element of that prefix.

So perhaps what we need is a foldWhile. That would let us prime the pump with what we want for the base case, and walks the list rather than chopping off a prefix.

More when I have time

Monday, August 05, 2013

lastLE cleanup

Remember lastLE? Back when we were trying to determine how many digits/bits it takes to represent a non-negative Integer, we used it to avoid copying a prefix of a list just to grab the value at its end.

lastLE :: Integer -> [Integer] -> Maybe (Integer, Int)

lastLE n xs =
let lastLE' xs prevVal prevIndex
     | head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
     | otherwise    = if prevIndex < 0 then Nothing
                 else Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

It still bugs me. First, the mixed guards and if/else are less than idiomatic Haskell:

lastLE n xs =
let lastLE' xs prevVal prevIndex
     | head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
           | prevIndex < 0 = Nothing
     | otherwise     = Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

Second, the head and tail are clumsy:

lastLE n xs =
let lastLE' (x:xs) prevVal prevIndex
     | x <= n        = lastLE' xs x (prevIndex + 1)
           | prevIndex < 0 = Nothing
     | otherwise     = Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)

Third, that accumulator should be strict, so pretend there's a ! in front of prevIndex in that let.

I think I'd rather see Nothing as the otherwise case.

lastLE n xs =
let lastLE' (x:xs) prevVal !prevIndex
     | x <= n         = lastLE' xs x (prevIndex + 1)
           | prevIndex >= 0 = Just (prevVal, prevIndex)
     | otherwise      = Nothing
in lastLE' xs (-1) (-1)

Come to think of it, we can be consistent in our use of pairs:

lastLE n xs =
let lastLE' (x:xs) !prev@(val, index)
     | x <= n         = lastLE' xs (x, index + 1)
           | prevIndex >= 0 = Just prev
     | otherwise      = Nothing
in lastLE' xs (-1, -1)

Now, one more thing. Can I get rid of the magic numbers there, the -1s? The -1 as initial index needs to be there to get the correct position in the list of powers of the base. The initial val will never be used. If the whole pair is strict, though, we couldn't pass in ⊥ ("bottom", the undefined value, which is undefined in Haskell). Perhaps

lastLE n xs =
let lastLE' (x:xs) prev@(val, !index)
     | x <= n     = lastLE' xs (x, index + 1)
           | index >= 0 = Just prev
     | otherwise = Nothing
in lastLE' xs (undefined, -1)

would be the way to go. Hey! We never even use that value here--it's used in the caller--so we can write

lastLE n xs =
let lastLE' (x:xs) prev@(_, !index)
     | x <= n     = lastLE' xs (x, index + 1)
           | index >= 0 = Just prev
     | otherwise = Nothing
in lastLE' xs (undefined, -1)

and it should work (sure enough, it does), as well as expressing explicitly that we don't use it. Well, that's one magic number down, at least.

Sunday, August 04, 2013

Small things...

Good grief... I've let this go for a month. (UPDATE: No, I haven't. It helps to read the dates carefully.)

An option in pattern matching that I wasn't fully aware of: you can use _ when you don't care about what is there, and have no need to refer to it. I guess I did use it on the ByteString readInteger that returns Maybe(Integer, ByteString):

process line = case map B.readInteger (B.words line) of
               [Just (m, _), Just (n, _)] -> numXs m n
               _                          -> -1

But I didn't realize that I could use it in

ysInRange m n d
    | nVal == n = nPos - mPos + 1
    | otherwise = nPos - mPos
    where (_,    mPos) = findFirst' m
          (nVal, nPos) = findFirst' n
          findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
                             Just (Max i, Max j) -> (i, j)
                             Nothing             -> (tenToThe d, numNDigitYs d)

because we need only check for an exact match at the high end.

Saturday, July 27, 2013

Look for elegance

Check out nomeata's blog post "On taking the last n elements of a list". There's the obvious

takeLast n = (reverse . take n . reverse)

but all that reversing is a lot of work. The temptation is to think imperative and haul out the heavy artillery... but then he realized how it could be done efficiently and idiomatically. I won't put a spoiler here; read the original.

Thursday, July 04, 2013

"Evelyn was a modified dog..."

So, how can we modify the tree to do what we'd like?

As the code stands, it's set up to say that if a is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search. What we want to say is that if you have some type a that has a function key :: a -> b where b is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search based on the result of key. Then, if we have a list of (palindrome, position) pairs, our key function is just fst (the function that gives you the first item in a pair). Now, how to express that in Haskell?

UPDATE: come to think of it, wouldn't that subsume the existing cases? For them,

let key = id

UPDATE: OK, I should have remembered. You can create relationships between type classes. (Take a look at this diagram of relationships among types and type classes in the Standard Prelude.) So, it looks like I can say something like this:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}

class Semilattice a where
meet :: a -> a -> a

class Semilattice k => Keyed r k | r -> k where
key :: r -> k

The first type class definition is straight from Mr. van Laarhoven's code; a type is a Semilattice type if it has a meet function which satisfies the semilattice requirements for meet.

Keyed is a "multiparameter type class"; it sets up a relationship between a type r (intended to suggest "record") and a type k (intended to suggest "key"). "r -> k" is a "functional dependency". Those two features aren't in the Haskell 98 standard, but GHC supports them (if they're enabled in a pragma) and I suspect they're in Haskell'.

(Historical aside: you might not realize that the preceding paragraphs use terms that originate with Algol 68: "standard prelude" and "pragma". Algol 68 is a language that suffered from a lot of undeserved bad press; you really should read C.H. Lindsey's excellent paper on its history, and check out Algol 68 Genie. For those of you interested in functional programming, I should note that partial parametrization was up for addition to the standard and is implemented in Algol 68 Genie.)

So, I will proceed on this basis. It should be a simple rewrite of the Semilattice tree code, though I wish there were a way to have one version of it to do everything; code reuse is a Good Thing. The above lines make it past ghci, so I hope it's a good start. (Then, of course, the question is whether it really does help the Fair and Square program! OTOH, in a way I don't care, because the real goal was to learn more Haskell, not to mention that I was just lucky that the original blog post gave an example in which the Semilattice type was a pair.)

Tuesday, July 02, 2013

What next?

Yesterday it occurred to me that the profiling output might be more accurate if I just asked for -p, rather than always asking for the heap tracking. The first part of the profile output:

    Mon Jul 1 22:23 2013 Time and Allocation Profiling Report (Final)

       ultimatepalindrome20 +RTS -p -RTS

    total time =        0.17 secs   (167 ticks @ 1000 us, 1 processor)
    total alloc = 116,700,576 bytes (excludes profiling overheads)

COST CENTRE            MODULE                %time %alloc

fromList.(...)         SemilatticeSearchTree 19.2   26.8
ysPair.noTwos          Main                   11.4    8.9
floorSqrt.floorSqrt'.y Main                   10.2    5.6
choices                Main                    7.8   10.7
process                Main                    7.2   11.0
fromList               SemilatticeSearchTree   6.6    0.6
ysPair.spread.(...)    Main                    4.8    2.4
ysPair.noTwos'         Main                    4.2    4.2
meet                   SemilatticeSearchTree   3.6    0.0
satisfy                SemilatticeSearchTree   3.0    0.0
main                   Main                    1.8    1.0
numYs                  Main                    1.8    0.1
ysPair.pairSum         Main                    1.8    3.5
makeYTree              Main                    1.8    4.1
bDigits                Main                    1.8    0.0
meet                   SemilatticeSearchTree   1.8    3.5
mkBranch               SemilatticeSearchTree   1.8    3.2
ysPair.noTwosChoices   Main                    1.2    0.7
bDigits.bDigits'       Main                    1.2    1.9
bound                  SemilatticeSearchTree   1.2    0.0
ysPair.noTwos'.base    Main                    0.6    1.2
ysPair.spread          Main                    0.6    2.5
ysPair.twoNPlus1List   Main                    0.0    2.9

The cost of ysPair is spread out over several lines; add them up and you get 32.4% of the time (choices count; there's only one caller) and 37% of the allocation... but then, the rest of the profile does the adding for us; let's check it out. OK, I was pretty close. fromList with its descendants takes up about another third of the time and allocation.

floorSqrt and ceilSqrt should be low-hanging fruit. As we've said before, it converges quadratically, doubling the number of valid bits each time around... but for numbers of up to 330 bits, that's as many as nine iterations starting from just one good bit as floorSqrt does. So, the temptation is to use the double precision square root directly where possible, and use it as a first approximation otherwise. With 52 good bits as a starting point, three iterations will do the job for values up to a googol.

On the other hand, coming up with some way to use the semilattice search trees to just search on one value (and only store the maximum value searched for as the meet!), with the rest of the node along for the ride, would save time and memory and be more generally applicable, and would help me with Haskell in general. That's the way to go.

Saturday, June 29, 2013

Dueling with the input

Still thinking about how to keep those combinations (or palindromes, now that we see we can avoid some sorting if we go ahead and generate them) around only as long as necessary. One thing that's kind of tempting is to take advantage of what we know about when we can reuse these values--we can generate them for one of 2 * k and 2 * k + 1 and then reuse them (with some modification) for the other. So, why not generate the trees two at a time?

"Well, you don't know that you'll need both of them," you object. "The Code Jam people could make you waste your time by generating input that only asks about ranges corresponding to Ys with even numbers of digits (or odd, take your pick)." And you're right, they could. Our attempts to speed things up can be subverted... but I think this one is worth a try.

At least initially, I'd be inclined to generate a list of pairs of trees, with the first being for 2 * k digits, the second being for 2 * k + 1. (Yes, one-digit Ys are a special case again.) More news as it happens.

UPDATE: new best time output:

real    0m0.192s
user    0m0.160s
sys     0m0.028s

Total allocation is down to not quite 117 MB, compared with 124 MB before, and "maximum residency" is around 9.6 MB. GC time is down to 41.2% of total execution time. I guess that's better than almost half, but that still seems high. One thing that is gratifying: the first time I used -sstderr to save GC info, it listed almost 160 MB copied during GC and over 16 MB maximum residency. Now the maximum copying during GC is just 63 MB, less than half what it was before.

And that's using spread, which is expensive because it does Integer divides. We'll do that next, and then I really should try to ditch the gratuitous meets of values in the trees that we never search for. That should be pure gravy.

UPDATE: spread is worth it to avoid having to sort again. (Before we weren't taking advantage of that, so spread was overhead.)

Improvements

Chaddaï Fouché kindly responded to a query I put out on the haskell-beginners mailing list, suggesting:

using iterate
using Data.Vector or MemoTrie instead of lists where possible

What's iterate, you ask?

iterate :: (a -> a) -> a -> [a]

Give it a function f and a starting value s and it will hand you back

[s, f s, f . f s, f . f . f s, ...]

So, for example, rather than

powersOfTen = map (10 ^) [0..]

we can write

powersOfTen = iterate (10 *) 1

and instead of

bigTwos = map (2 ^) powersOfTwo

we can write

square n = n * n

bigTwos = iterate square 2

Not shabby, eh?

About those Vectors: they're a data structure that makes for O(1) (i.e. constant time) indexing, as opposed to the O(n) time for lists. There are two flavors: Data.Vector and Data.Vector.Unboxed. The unboxed version has lower overhead, but can't be used on all types.

So I added

import qualified Data.Vector as V

and changed a little bit of code:

          memoPair   = V.generate halfN (\i -> tenToThe i + tenToThe (n - 1 - i))
          pair i     = memoPair V.! i
          pairSum v = V.foldl' (+) shell (V.map pair v)

choices :: Int -> Int-> [V.Vector Int]

m `choices` n
    | n == 0           = [V.empty]
    | m == n           = [V.enumFromStepN m (-1) m]
    | otherwise        = [m `V.cons` c | c <- (m - 1) `choices` (n - 1)]
                         ++ ((m - 1) `choices` n)

and indeed it helped. I hope it will help more soon; I can't have an unboxed vector of Integers (or is that a vector of unboxed Integers?), so if I want the choices Ints unboxed, I can't use the unboxed vector map; I'll have to roll my own function for that.

A bigger payoff, for now, had to do with sorting, or the minimizing thereof. It's easy to make oneTwos come out in order; noTwos is the gotcha. We did the folliowing:

nDigitYs n = (merge (oneTwos n) (noTwos n)) ++ twoTwos n
    where twoTwos n
              | even n    = [twoShell]
              | otherwise = [twoShell, twoShell + tenToThe halfN]
              where twoShell = 2 * shell
          oneTwos n
              | even n    = []
              | otherwise = map (+ (shell + 2 * tenToThe halfN))
                                (0 : map pair [halfN - 1,halfN - 2..1])
          noTwos n
              | even n    = base
              | otherwise = merge base [p + tenToThe halfN | p <- base]
              where base = sort $ map pairSum (noTwosChoices !! (halfN - 1))

and poof! The profiler output shows total time as 174 ticks, 0.17 seconds. The best time output is now

real    0m0.214s
user    0m0.176s
sys     0m0.028s

No reduction in the percentage of time used for garbage collection, though.

You know... now it may be worth keeping palindromes rather than combinations, because we would only have to sort them once instead of twice--and that might cut the memory usage as well.

Sunday, June 23, 2013

So, how am I being wasteful?

Let me count the ways:

We're keeping the combinations for noTwos (2 * k) and noTwos (2 * k + 1)around forever. How can we let go of them when both those have been calculated? The simplest way would be to generate nDigitYs (2 * k) and nDigitYs (2 * k + 1) at the same time, but that has the potential to make us do extra work. This is Haskell, after all; laziness is a virtue. There has to be a better way.
What is up with pairSum? (I renamed it to fit convention.) It's using up 11% of the time and nearly 12% of storage allocations; the graph shows it accumulating two megabytes of heap. Are we piling up thunks?

Alas, this isn't as simple as the Real World Haskell Chapter 25 example. There you know that accumulating and counting items on a list, which is all the example program does, ought to be doable in a constant amount of memory. We, on the other hand, are accumulating things as we go. Ideally, all we accumulate would be

The trees for the various numbers of digits that we actually have to search in
The memoized partial sums of numNDigitYs

Of course, we are also keeping around powers of ten and two, and, right now, the lists of lists of Ints for noTwos.

The immediate temptation is to make numXs take three parameters and return two values, with the added input and output being the extranea that we want to keep around for a while (initially empty, of course, and the added output of one call being passed in as the added input for the next). That seems ugly, though; the exact opposite of information hiding. I'm sure someone's thought of this sort of situation and dealt with it; I just have to learn about it. In the meantime, there's still pairSum to optimize.

Saturday, June 22, 2013

In the great tradition of nibbling at the edges...

...and because I am still puzzling over why pairsum is grabbing so much RAM, let's contemplate floorSqrt. We're just calling it 2,000 times, and yet it's collectively taking up over six percent of the CPU time and five percent of allocation?!

Let's remind ourselves of floorSqrt (tweaked because I forgot that I'd memoized powers of two)

floorSqrt :: Integer -> Integer

floorSqrt 0 = 0
floorSqrt n = floorSqrt' (twoToThe ((1 + bitsIn n) `div` 2))
    where floorSqrt' x =
              let y = (x + n `div` x) `div` 2
              in if y >= x then x else floorSqrt' y

This is a special case of the Newton-Raphson method that was known to the ancient Greeks. We treat zero specially to avoid one of the gotchas of Newton-Raphson, but aside from that, this particular example is well-behaved, and has what they call quadratic convergence: once you're sufficiently close, each successive guess is good to twice as many digits as the one before. Our first guess is good to one bit, so the next is good to two bits, then four, then eight, etc. We're dealing with Xs in ranges up to a googol, or around 2³³⁰, which would me no more than nine iterations per value. Checking the profiling output, we see a total of 7,175 + 7,472 = 14,647 calls to floorSqrt', which is within our constraint, but darned close to it, so the inputs are tending to the high end of the range.

I kind of hate to cheat, but let's see what happens with

floorSqrt n = floorSqrt' $ (floor. sqrt . fromIntegral) n

[pause to compile, run, check profile output, and check... uh-oh.]

What happens is that it doesn't give the correct results; the output is not correct, which is weird. You'd think that would give you 52 good bits, and thus just need three iterations, but then, look at that termination condition. There has to be something about it that needs that first trial value to make it work, so you can't just plug in another, even better, guess. Taking the first value from the large input, the floorSqrt we've been using returns

10000010000100100100100001000000

while using the supposedly better first guess gives us

10000010000100100017503961350144

We could change the function to take ceil log2 (# bits in n / 52) iterations, but I think I should get back to dealing with memory usage and pairsum. (Even with that change you probably would end up flipping a coin for whether you got the floor or the ceiling of the square root of n.)

Thursday, June 20, 2013

Now that we have better data...

...let's see what's happening.

First, the single sample point faked us out, making us think we were using less RAM than we really were. We are chewing up pretty nearly ten megabytes--still less than that C program, but a bit disappointing.

Or maybe more disappointing than we thought. Here's the output describing heap usage from a run with the options +RTS -sstderr:

     202,392,800 bytes allocated in the heap
      84,601,736 bytes copied during GC
      12,550,176 bytes maximum residency (8 sample(s))
         220,824 bytes maximum slop
              30 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed) Avg pause Max pause
Gen 0       387 colls,     0 par    0.10s    0.10s     0.0003s    0.0009s
Gen 1         8 colls,     0 par    0.06s    0.06s     0.0080s    0.0230s

INIT    time    0.00s ( 0.00s elapsed)
MUT     time    0.19s ( 0.20s elapsed)
GC      time    0.16s ( 0.16s elapsed)
RP      time    0.00s ( 0.00s elapsed)
PROF    time    0.00s ( 0.00s elapsed)
EXIT    time    0.00s ( 0.00s elapsed)
Total   time    0.36s ( 0.36s elapsed)

%GC     time      45.2% (45.3% elapsed)

Alloc rate    1,028,021,469 bytes per MUT second

Productivity 54.8% of total user, 55.0% of total elapsed

Whoa. Nearly half our time spent garbage collecting? There's got to be a way to improve that. (And on the graph, nearly two megabytes taken by pairsum?)

Learning to avoid cabal--or not

Well... last night I thought I'd install EclipseFP, a package for Eclipse to support Haskell development. When you fire it up, it goes looking for packages it wants, and apparently uses cabal to install them. It did; I watched it do so for some time.

It turned out to be a waste of time; when I fired up Eclipse (which I'm rather new to) and clicked on the little lambda over to the left, a window opened up that looked half-drawn and very broken. I suspect that was issues with Eclipse--perhaps I should wipe the latest version that I grabbed and installed, and settle for the ancient version that, for some reason, is what Ubuntu has in its repositories.

OK, so I'll pass on an IDE for Haskell for now, or start up with leksah.

This morning, I had one of those sudden realizations that you get that make you laugh at yourself. Why do those memory usage graphs look like pyramids? Because the default sample interval is 0.1 seconds, and I have the run time down around 0.2 seconds, sort of like taking a sample every couple of minutes and expecting to get an accurate playback of a song.

A peek at Real World Haskell and I see the option to change the interval, so I recompile and run... and it claims that I should link with an option to turn on the RTS capabilities.

What?!

Another compile or two to make sure I did indeed specify the right options... and then some Googling, because the same thing still happened.

From the Google results, I suspect that all those cabal installs pulled in versions of libraries that don't support profiling, and that ghc is pulling them in. Following people's advice, I rm -rf ~/.ghc. No luck, still can't profile.

I have learned one thing: I am not going to let cabal touch my computer ever again. I would have sworn that I specified that packages should just be installed for me, so the rm should have done the trick. I guess I can look for Haskell libraries dated yesterday and delete every single one of them.

UPDATE: BZZT! Turns out the issue comes from trying to use that -i option to override the heap measurement interval. That's what's giving me the problem. I was wrong... and maybe I'll consider using cabal... sometime when I know Haskell much better than I do now.

UPDATE: Found it... since ghc 6.x, you have to compile with -rtsopts=all to be able to use some (OK, most) RTS options, lest the logging they permit be used to breach security. I will have to try to figure out how -i would give you the opportunity to do so over and above the output that -h*, which doesn't require -rtsopts=all, allows. (OK, maybe it would be possible to allocate a lot of RAM or not every [interval], a lot of RAM means a 1, not much means 0.)

hlint

Back in the early days of Unix, when the PDP--11/70's main advantage over the later 8/16-bit 6809 was having separate I/D (instruction and data) space, so that you could have 64K of code with access to 64K of data, the virtue of the Unix Way of small programs that did one thing and did it well was a necessity. One of the things it gave rise to was a separate program, "lint", to check C source code for constructs that might be evidence of a coding error, so that the compiler could concentrate on simply generating code.

Nowadays, C compilers often do some of the checking that was once delegated to lint (though separate lint programs still exist, and are very good--e.g. splint, or Gimpel Software's excellent products).

What about Haskell? Well, for Haskell there's hlint. It will give you advice on the Haskell source that you feed it.

jejones@eeyore:~/src/haskell_play$ hlint ultimatepalindrome14.hs
ultimatepalindrome14.hs:236:13: Warning: Use zipWith
Found:
map showsResult $ zip [1 ..] (map process (tail $ B.lines s))
Why not:
zipWith (curry showsResult) [1 ..] (map process (tail $ B.lines s))

ultimatepalindrome14.hs:237:5: Error: Use .
Found:
mapM putStr $ map ($ "\n") r
Why not:
mapM (putStr . ($ "\n")) r

2 suggestions
jejones@eeyore:~/src/haskell_play$

So there are (at least!) a couple of places I could have written arguably better, easier to read, more idiomatic Haskell. Pretty cool.

What's with that "."? In that context, "." is the function composition operator; given two functions, one of which returns values of the type the other takes as input, composing them gives you a function that applies first one function, then the other. Huh? Better to write it in Haskell:

. :: (a -> b) -> (c -> a) -> (c -> b)

f . g x = f (g x)

or, equivalently,

f . g = \x -> f (g x)

\ isn't a character escape; it's the closest the creators of Haskell could come to the Greek letter lambda, as in Alonzo Church's "lambda calculus". A lambda expression is an "anonymous function"; the way to read the right hand side of that last line is "the function that, given an argument x, returns f (g x)".

(Ironically, the lambda was in turn a sort of best approximation; it was inspired by the use of the circumflex in the notation of Russell and Whitehead's Principia Mathematica. Details here.)

Anyway... there's also a Haskell "style scanner" that one can either just get suggestions from or use in the fashion of indent. I will have to check it out.

Wednesday, June 19, 2013

It's not just an idiom...

In this program I've taken a list of values and fed it to

zip [1..]

You'll recall that zip takes two lists and returns a new list as long as the shortest of the lists handed to it. Each element of the new list is a pair of values at corresponding positions in the lists, so that, for example,

zip [1..] "hiya" == [(1, 'h'), (2, 'i'), (3, 'y'), (4, 'a')]

Remind you of anything you learned to do as a kid? Yes, it's constructing an bijection from a set (here represented as a list) to the first however many counting numbers, aka counting. Little did I know back then that I was preparing for Haskell.

Tuesday, June 18, 2013

Further restructuring

It occurred to me that I could push the code even further, making the connection between counting and generation more apparent. Recall the counting code:

numNDigitYs 1 = 3
numNDigitYs n = numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos n = if even n then 1 else 2
          numOneTwos n = if even n then 0 else n `div` 2
          numNoTwos n = if even n then s else 2 * s
                         where h = n `div` 2 - 1
                               s = sum [h `choose` i | i <- [0..min h 3]]

Here's the new generation (which sounds like some cheesy 60s thing--sorry!); we've gotten rid of justOnes.

nDigitYs 1 = [1,2,3]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
    where halfN = n `div` 2
          pair i = tenToThe i + tenToThe (n - (i + 1))
          twoTwos n
              | even n    = [twoTwosNoOnes]
              | otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe halfN]
              where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
          oneTwos n
              | even n    = []
              | otherwise = map (+ common)
                                 (0 : [pair i | i <- [1..halfN - 1]])
              where common = pair 0 + 2 * tenToThe halfN
          noTwos n
              | even n    = base
              | otherwise = concat [[p, p + tenToThe halfN] | p <- base]
              where pairsum xs = foldl' (+) (pair 0) (map pair xs)
                    base       = map pairsum (noTwosChoices !! (halfN - 1))

noTwosChoices = [concat [n `choices` k | k <- [0..min 3 n]] | n <- [0..]]

We explicitly generate the oneTwos values in a way that makes clear that there are n `div` 2 of them, and it's similarly clear that numNoTwos is correct. Life is beautiful, right?

Well, not quite. We're now eating up ten megabytes of memory instead of six and change; total allocation is up from 127 MB to 140 MB, and total time, according to the profiling output, is up by .03 seconds. Saving lists of lists of Ints will definitely add to the allocation, so we would expect that. We're really beating on pair and pairsum, so that's probably where to look to get resource usage pared (no word play intended) back down.

UPDATE: We didn't memoize pair; doing that took us back down to 0.2 seconds, cutting the difference back down to .01 seconds. Total allocation is back down to 130 MB, but memory is up somewhere between 10 and 11 MB.

UPDATE: I wonder whether I forgot something, perhaps -fllvm. Recompiling and rerunning shows 0.19 seconds run time in profiler output, and just about 8 MB of memory usage (as opposed to total allocation, still around 130 MB).

I hate to do something as crass as packing three values in an Int or maybe an Int32, but it's grating to keep a list of lists of Ints hanging around, too. I'll give it a try. (The results were slower and used more RAM. So much for that.)