Posts

Showing posts from 2013

Someone has done this before...

Happened across it while looking for Haskell-related blogs to link to: from Derek Wyatt's blog, a post titled "Haskell sequence over functions -- explained" gives this example:

sequence [(> 4), (< 10), odd] 7

gives the result we would get from our wonkyMap with the same arguments:

[True, True, True]

What is this sequence?

:t sequence
sequence :: Monad m => [m a] -> m [a]

If m is a monad, then sequence will map a [m a] to a m [a].

So, our functions must have the type m a for some m and a. Let's inquire with ghci once again.

:t sequence [(> 4), (< 10), odd]
sequence [(> 4), (< 10), odd] :: Integral a => a -> [Bool]

Aha. That reminds me of the exercise in Typeclassopedia: show that (->) r is a functor.  (WTF? That's the type of functions that take an argument of type r.) Turns out that not only is it a functor, it's a monad; Typeclassopedia refers to it as the "reader monad".

Unfortunately, we have dueling definitions here. M…

Only a few years late...

So I was looking at a list of requirements, trying to make sense of them and make sure I would come up with something that would satisfy them, when it occurred to me that you should bypass the middleman. Requirements should be written in the form of tests that pass iff the software satisfies them.

Turns out I'm a few years late. Dan North has already come up with the notion of "behavior-driven development". There are now parsers for "ubiquitous languages" in which one writes descriptions of how a piece of software should behave if you do certain things, and from those descriptions one generates tests. One example: Cucumber. (Why Cucumber? I think it's from the convention of having output from tests that fail come out in red while tests that pass generate output in green--like a cucumber.)

Haskell has some very neat facilities for testing, e.g. QuickCheck and HUnit. Is there a behavior-driven development package for Haskell? Why yes, there is: HSpec. Sweet.

You'd think someone's done this before

Saw a link to an Introduction to Haskell course set up by a student at the University of Virginia (which inspired others), and saw one of the homework assignments: write a function

strong :: String -> Bool

which returns True iff the string passed to it is a "strong" password, which for purposes of the assignment means
it's at least 15 characters longit includes upper case lettersit includes lower case lettersit includes digits Easy enough to write, especially if you import Data.Char, but the lesson is about higher-order functions, and you're urged to use the things taught in the lesson (among which are the character classification functions in Data.Char). So it occurred to me that it would be good to write the function in a way that makes it easy to add other constraints--perhaps it shouldn't be in some list of bad password choices, say.
So, we'd like to write it to take a list of String -> Bool, apply them to the String, and confirm that they all return …

Project Euler #14 revisited

Recently I came a cross a link from... aargh, was it Reddit or Hacker News? I should have saved the link. In any case, it was from someone who had decided to try to optimize a solution to Project Euler #14 in Haskell, and it induced me to revisit the program I'd written for it. Euler #14 asks the musical question, which number less than a million takes the longest for its Collatz sequence to get to 1?

{-# LANGUAGE BangPatterns #-}

import Data.List
import Data.Ord

collatzChainLen :: Int -> Int
collatzChainLen n = collatzChainLen' n 1
    where collatzChainLen' n !l
            | n == 1    = l
            | otherwise = collatzChainLen' (collatz n) (l + 1)
          collatz n = if even n then n `div` 2 else 3 * n + 1

pairMap :: (a -> b) -> [a] -> [(a, b)]
pairMap f xs = [(x, f x) | x <- xs]

main :: IO ()
main = print $ fst (maximumBy (comparing snd) (pairMap collatzChainLen [1..999999]))

Compiled with ghc -O2, about 3.5 seconds on my 2.8 GHz Athlon 64.
Of course, the th…

Project Euler #78

If you have a pile of n coins, there's some number of distinct ways to divide them up into piles; let's refer to the function that maps a number of coins to the number of ways to divide them up into piles as p. The example they give is for five coins; as you can see, p 5 == 7.

OOOOO
OOOO O
OOO OO
OOO O O
OO OO O
OO O O O
O O O O O

Things to note:
the coins are all alike; it's not like you have different denominations that could distinguish one pile from another with the same number of coins. order doesn't matter; OOO OO and OO OOO don't count as different. One's first inclination is to say "If you create a pile of m coins, that leaves n - m, and there are p (n - m) ways to divide them up, right?" Not exactly; if you don't confine yourself to piles of m or fewer coins, you'll violate (2). So we need what we'll call p', where p' n m is the number of ways to divide up n coins into piles of size m or less. Then you have

p :: Int -> Int
p n = …

Data Structures

The conventional wisdom on data structures in functional languages is that immutability costs you. It was a big advance when Okasaki came up with algorithms that have an amortized complexity for lazy functional languages the same as the non-amortized complexity for the data structures with destructive update.

Work is going on in that area; check out Edward Kmett's post "Part I: Deamortized ST" about how to come up with a way to make the amortization unnecessary. (It's a bit reminiscent of Asimov's essay "Behind the Teacher's Back".) I'm hoping that will ultimately mean Haskell will be usable in still more situations.

(Actually, I should just say "read Edward Kmett's blog" period. It's all very good stuff.)

Let ghc help you write your code--must {read, watch} from Matthew Brecknell

Suppose you had no idea what function composition should do, but you were given the type:

(.) :: (b -> c) -> (a -> b) -> a -> c

and had to define it:

(f . g) x = ???

We want a c. The only way we know to get a c is with f, which will give us a c if we give it a b. The only way we're given to get a b is with g, which will give us a b if we give it an a. Hey, x is an a!

(f . g) x = f (g x)

is thus the only way to honor the declaration.

Yes, this is a simple example... but not only can the method be applied more generally, you can get ghc to help you with error messages of the form "I expected a here but {you gave me something else, I can't infer that}". For details and a video working through examples, check out Matthew Brecknell's blog post "Hole-Driven Haskell".

A new problem

OK, let's try a new problem this time: Project Euler #92. I happened across a YouTube video of a "live coding" session by Daniel Silverstone. It's worth your while. I admire the courage of someone willing to show such an unedited session, typos and all, and besides, the video shows just how expressive Haskell is.

So, having seen Mr. Silverstone's video, this is another case in which I can't pretend to having had all the insights myself. I will write the code, though; it's not copy and paste.

The problem has to do with a function that maps n to the sum of the squares of the digits of n expressed in base ten. I can't quite bring myself to use show, so...

sumSqDigits :: Int -> Int
sumSqDigits n = sumSqDigits' n 0
    where sumSqDigits' n !s
           | n < 10    = n * n + s
           | otherwise = let (q, r) = divMod n 10 in sumSqDigits' q (r * r + s)

Now, it can be shown that for non-zero n, iterate sumSqDigits n will "end" with…

You call that a cleanup?

OK, over on Reddit they make a good point, echoing Simon Thompson in the intro to the second edition of Haskell: The Craft of Functional Programming. He mentions there emphasizing the higher-order functions of the Standard Prelude because (I'm paraphrasing here, and working from memory) otherwise students tend to stay at the lower level, churning out the same recursive schema over and over again... and I'm certainly People's Exhibit #1 of that with bdigitsand lastLE. I mean, good grief.

So, let's review the main idea: given two Integer values n and b, where b > 1, we want to know how many digits it takes to write n in base b. The thing is that n may be big, so rather than the stock counting digits one at a time, comparing against b each time, we compare with the values of iterate square b, i.e. [b ^ (2 ^ i) | i <- [0..]].  Only a finite number will be less than or equal to n. If none are, then obviously one base b digit will do. Otherwise, divide by the last one,…

lastLE cleanup

Remember lastLE? Back when we were trying to determine how many digits/bits it takes to represent a non-negative Integer, we used it to avoid copying a prefix of a list just to grab the value at its end.

lastLE :: Integer -> [Integer] -> Maybe (Integer, Int)

lastLE n xs =
    let lastLE' xs prevVal prevIndex
           | head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
           | otherwise    = if prevIndex < 0 then Nothing
                                             else Just (prevVal, prevIndex)
    in lastLE' xs (-1) (-1)

It still bugs me. First, the mixed guards and if/else are less than idiomatic Haskell:

lastLE n xs =
    let lastLE' xs prevVal prevIndex
           | head xs <= n  = lastLE' (tail xs) (head xs) (prevIndex + 1)
           | prevIndex < 0 = Nothing
           | otherwise     = Just (prevVal, prevIndex)
    in lastLE' xs (-1) (-1)

Second, the head and tail are clumsy:

lastLE n xs =
    let lastLE' (x:xs) prevVal prevIndex
 …

Small things...

Good grief... I've let this go for a month. (UPDATE: No, I haven't. It helps to read the dates carefully.)

An option in pattern matching that I wasn't fully aware of: you can use _ when you don't care about what is there, and have no need to refer to it. I guess I did use it on the ByteString readInteger that returns Maybe(Integer, ByteString):

process line = case map B.readInteger (B.words line) of
               [Just (m, _), Just (n, _)] -> numXs m n
               _                          -> -1

But I didn't realize that I could use it in

ysInRange m n d
    | nVal == n = nPos - mPos + 1
    | otherwise = nPos - mPos
    where (_,    mPos) = findFirst' m
          (nVal, nPos) = findFirst' n
          findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
                             Just (Max i, Max j) -> (i, j)
                             Nothing             -> (tenToThe d, numNDigitYs d)

because we need only check for an exact match at t…

Look for elegance

Check out nomeata's blog post "On taking the last n elements of a list". There's the obvious

takeLast n = (reverse . take n . reverse)

but all that reversing is a lot of work. The temptation is to think imperative and haul out the heavy artillery... but then he realized how it could be done efficiently and idiomatically. I won't put a spoiler here; read the original.

"Evelyn was a modified dog..."

So, how can we modify the tree to do what we'd like?

As the code stands, it's set up to say that if a is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search. What we want to say is that if you have some type a that has a function key :: a -> b where b is in the type class Semilattice, you can create a SearchTree of as that you can efficiently search based on the result of key. Then, if we have a list of (palindrome, position) pairs, our key function is just fst (the function that gives you the first item in a pair). Now, how to express that in Haskell?

UPDATE: come to think of it, wouldn't that subsume the existing cases? For them,

let key = id

UPDATE: OK, I should have remembered. You can create relationships between type classes. (Take a look at this diagram of relationships among types and type classes in the Standard Prelude.) So, it looks like I can say something like this:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDepe…

What next?

Yesterday it occurred to me that the profiling output might be more accurate if I just asked for -p, rather than always asking for the heap tracking. The first part of the profile output:

    Mon Jul  1 22:23 2013 Time and Allocation Profiling Report  (Final)

       ultimatepalindrome20 +RTS -p -RTS

    total time  =        0.17 secs   (167 ticks @ 1000 us, 1 processor)
    total alloc = 116,700,576 bytes  (excludes profiling overheads)

COST CENTRE            MODULE                %time %alloc

fromList.(...)         SemilatticeSearchTree  19.2   26.8
ysPair.noTwos          Main                   11.4    8.9
floorSqrt.floorSqrt'.y Main                   10.2    5.6
choices                Main                    7.8   10.7
process                Main                    7.2   11.0
fromList               SemilatticeSearchTree   6.6    0.6
ysPair.spread.(...)    Main                    4.8    2.4
ysPair.noTwos'         Main                    4.2    4.2
meet                   SemilatticeSearchTr…

Dueling with the input

Still thinking about how to keep those combinations (or palindromes, now that we see we can avoid some sorting if we go ahead and generate them) around only as long as necessary. One thing that's kind of tempting is to take advantage of what we know about when we can reuse these values--we can generate them for one of  2 * k and 2  * k + 1 and then reuse them (with some modification) for the other. So, why not generate the trees two at a time?

"Well, you don't know that you'll need both of them," you object. "The Code Jam people could make you waste your time by generating input that only asks about ranges corresponding to Ys with even numbers of digits (or odd, take your pick)." And you're right, they could. Our attempts to speed things up can be subverted... but I think this one is worth a try.

At least initially, I'd be inclined to generate a list of pairs of trees, with the first being for 2 * k digits, the second being for 2 * k + 1. (Yes, …

Improvements

Chaddaï Fouché kindly responded to a query I put out on the haskell-beginners mailing list, suggesting:
using iterateusing Data.Vector or MemoTrie instead of lists where possible What's iterate, you ask?

iterate :: (a -> a) -> a -> [a]

Give it a function f and a starting value s and it will hand you back

[s, f s, f . f s, f . f . f s, ...]

So, for example, rather than

powersOfTen = map (10 ^) [0..]

we can write

powersOfTen = iterate (10 *) 1

and instead of

bigTwos = map (2 ^) powersOfTwo

we can write

square n = n * n

bigTwos = iterate square 2

Not shabby, eh?

About those Vectors: they're a data structure that makes for O(1) (i.e. constant time) indexing, as opposed to the O(n) time for lists. There are two flavors: Data.Vector and Data.Vector.Unboxed. The unboxed version has lower overhead, but can't be used on all types.

So I added

import qualified Data.Vector as V

and changed a little bit of code:

          memoPair   = V.generate halfN (\i -> tenToThe i + tenToThe (n …

So, how am I being wasteful?

Let me count the ways:
We're keeping the combinations for noTwos (2 * k) and noTwos (2 * k + 1)around forever. How can we let go of them when both those have been calculated? The simplest way would be to generate nDigitYs (2 * k) and nDigitYs (2 * k + 1) at the same time, but that has the potential to make us do extra work. This is Haskell, after all; laziness is a virtue. There has to be a better way. What is up with pairSum? (I renamed it to fit convention.) It's using up 11% of the time and nearly 12% of storage allocations; the graph shows it accumulating two megabytes of heap. Are we piling up thunks? Alas, this isn't as simple as the Real World Haskell Chapter 25 example. There you know that accumulating and counting items on a list, which is all the example program does, ought to be doable in a constant amount of memory. We, on the other hand, are accumulating things as we go. Ideally, all we accumulate would be
The trees for the various numbers of digits that we actu…

In the great tradition of nibbling at the edges...

...and because I am still puzzling over why pairsum is grabbing so much RAM, let's contemplate floorSqrt. We're just calling it 2,000 times, and yet it's collectively taking up over six percent of the CPU time and five percent of allocation?!

Let's remind ourselves of floorSqrt (tweaked because I forgot that I'd memoized powers of two)

floorSqrt :: Integer -> Integer

floorSqrt 0 = 0
floorSqrt n = floorSqrt' (twoToThe ((1 + bitsIn n) `div` 2))
    where floorSqrt' x =
              let y = (x + n `div` x) `div` 2
              in if y >= x then x else floorSqrt' y

This is a special case of the Newton-Raphson method that was known to the ancient Greeks. We treat zero specially to avoid one of the gotchas of Newton-Raphson, but aside from that, this particular example is well-behaved, and has what they call quadratic convergence: once you're sufficiently close, each successive guess is good to twice as many digits as the one before. Our first guess is …

Now that we have better data...

Image
...let's see what's happening.



First, the single sample point faked us out, making us think we were using less RAM than we really were. We are chewing up pretty nearly ten megabytes--still less than that C program, but a bit disappointing.

Or maybe more disappointing than we thought. Here's the output describing heap usage from a run with the options +RTS -sstderr:

     202,392,800 bytes allocated in the heap
      84,601,736 bytes copied during GC
      12,550,176 bytes maximum residency (8 sample(s))
         220,824 bytes maximum slop
              30 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       387 colls,     0 par    0.10s    0.10s     0.0003s    0.0009s
  Gen  1         8 colls,     0 par    0.06s    0.06s     0.0080s    0.0230s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.19s  (  0.20s elapsed)
  GC      time    0.16s  (  0.16s elapsed)
  RP      time    0.00s…

Learning to avoid cabal--or not

Well... last night I thought I'd install EclipseFP, a package for Eclipse to support Haskell development. When you fire it up, it goes looking for packages it wants, and apparently uses cabal to install them. It did; I watched it do so for some time.

It turned out to be a waste of time; when I fired up Eclipse (which I'm rather new to) and clicked on the little lambda over to the left, a window opened up that looked half-drawn and very broken. I suspect that was issues with Eclipse--perhaps I should wipe the latest version that I grabbed and installed, and settle for the ancient version that, for some reason, is what Ubuntu has in its repositories.

OK, so I'll pass on an IDE for Haskell for now, or start up with leksah.

This morning, I had one of those sudden realizations that you get that make you laugh at yourself. Why do those memory usage graphs look like pyramids? Because the default sample interval is 0.1 seconds, and I have the run time down around 0.2 seconds, sort…

hlint

Back in the early days of Unix, when the PDP--11/70's main advantage over the later 8/16-bit 6809 was having separate I/D (instruction and data) space, so that you could have 64K of code with access to 64K of data, the virtue of the Unix Way of small programs that did one thing and did it well was a necessity. One of the things it gave rise to was a separate program, "lint", to check C source code for constructs that might be evidence of a coding error, so that the compiler could concentrate on simply generating code.

Nowadays, C compilers often do some of the checking that was once delegated to lint (though separate lint programs still exist, and are very good--e.g. splint, or Gimpel Software's excellent products).

What about Haskell? Well, for Haskell there's hlint. It will give you advice on the Haskell source that you feed it.

jejones@eeyore:~/src/haskell_play$ hlint ultimatepalindrome14.hs
ultimatepalindrome14.hs:236:13: Warning: Use zipWith
Found:
  map showsRe…

It's not just an idiom...

In this program I've taken a list of values and fed it to

zip [1..]
You'll recall that zip takes two lists and returns a new list as long as the shortest of the lists handed to it. Each element of the new list is a pair of values at corresponding positions in the lists, so that, for example,

zip [1..] "hiya" == [(1, 'h'), (2, 'i'), (3, 'y'), (4, 'a')]
Remind you of anything you learned to do as a kid? Yes, it's constructing an bijection from a set (here represented as a list) to the first however many counting numbers, aka counting. Little did I know back then that I was preparing for Haskell.

Further restructuring

It occurred to me that I could push the code even further, making the connection between counting and generation more apparent. Recall the counting code:

numNDigitYs 1 = 3
numNDigitYs n = numTwoTwos n + numOneTwos n + numNoTwos n
    where numTwoTwos n = if even n then 1 else 2
          numOneTwos n = if even n then 0 else n `div` 2
          numNoTwos  n = if even n then s else 2 * s
                         where h = n `div` 2 - 1
                               s = sum [h `choose` i | i <- [0..min h 3]]

Here's the new generation (which sounds like some cheesy 60s thing--sorry!); we've gotten rid of justOnes.

nDigitYs 1 = [1,2,3]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
    where halfN  = n `div` 2
          pair i = tenToThe i + tenToThe (n - (i + 1))
          twoTwos n
              | even n    = [twoTwosNoOnes]
              | otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe halfN]
              where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
          oneTwos n
     …