### Palindromes again?

Actually, it's more Haskell than palindromes.

I've been looking over other people's solutions to the "Fair and Square" problem, and they are pretty impressive. As a still-inexperienced Haskeller, I am taking inspiration from then rather than feeling discouraged.

In particular, jagd's solution reminded me that I was not thinking Haskell. Haskell is lazy; values aren't calculated until you require them. That lets you define infinite data structures! You only get into trouble if you do something that requires the whole thing.

For example,

and then ask for

you will get back 1, 2, 3, 4, and 5, and then the function call will hang. You and I realize that

To get the result you want, you will have to write something like

What's

Give it a function that takes an a and returns a Bool, and a list of as, and it will hand you back a prefix of that list--all the items up to, but not including, the first one for which the function returns false.

With that definition,

gives you the

you expect, because we know that [1..] is in ascending order and the square function is monotonically increasing.

(There's nothing magic about

it will print even numbers forever, because

That means that I can simply define the whole infinite list of "fair and square" numbers. To make it possible to get those in a certain range, I will have to generate them in ascending order, so back comes the sort that we omitted for a while. I'm going to do something that clashes a bit with a Haskell convention for patterns in function definitions, because it matches the nomenclature of the problem discussion, in which the "fair and square" numbers are referred to as X and the palindromes they are squares of as Y.

Note that we aren't using

So do infinite data structures provide any advantage? Well, consider that I have yet to actually solve the Code Jam problem as such. Remember that it requires reading a file with ranges of values m and n for which we have to print a line of a given format with the value of

Defining the infinite list of Xs is a form of

Evaluating

will evaluate all the Xs between 0 and 1000 (actually one more past that, because you have to find the stopping place), so that when it comes time to process the next line, all the values between 40 and 500 will have already been determined, so all we have to do is select the values in range.

It's not what you'd want for serious memoization; after all, the input lines

Cool! Of course, I didn't submit these officially--I'm using a portion of someone else's solution, and peeked at the analysis of the problem, and took far longer to write the program than the rules permit, so it would in no way be fair to take credit--but Google let me download the inputs, and confirmed that the output generated is correct. Here's the time output, first for large input 1, 10K lines with ranges a subset of [1, 10

and then for large input 2, 1K lines with a range of [1, 10

Not shabby for time--and to refer back to Mr. Reeder's blog post, I'm sure that with those times, an interpreted version would have no trouble running in the eight minute limit Code Jam imposes--but how about memory? Time to instrument again... and sure enough, keeping that list, or perhaps those lists, around eats RAM. With large input 1, it uses 60+ kilobytes of RAM (the graph generated doesn't show the scale all the way up the Y (RAM) axis), and with large input 2, it uses about five megabytes of RAM. Even the five megabytes is small change these days, and now I suspect I wasn't looking closely at which run I was checking memory usage for last time around. I bet I thought a 10

UPDATE: I'm truly amazed at the brevity of jagd's solution; you really should check it out. I will have a good time studying it further. Just for the heck of it, I compiled it with ghc -O2, and tried it on the large input files. For large input 1,

For large input 2,

I wonder what it's doing differently to make it run 63.8 times slower on the larger input rather than 7.84 times slower?

UPDATE: part of it is the way it's keeping the list of Ys both in numeric and string form, with lots of conversions back and forth. That means more memory; on large input 2, jagd's program uses over 55 Mbytes of RAM, with a very interesting shape to the plot of RAM versus time.

I see that jagd also at least thinks he or she is generating some values that might not be valid Ys, given the

applied to potential Ys. Maybe that's part of the RAM usage fluctuation.

("." is function composition in Haskell, and

I've been looking over other people's solutions to the "Fair and Square" problem, and they are pretty impressive. As a still-inexperienced Haskeller, I am taking inspiration from then rather than feeling discouraged.

In particular, jagd's solution reminded me that I was not thinking Haskell. Haskell is lazy; values aren't calculated until you require them. That lets you define infinite data structures! You only get into trouble if you do something that requires the whole thing.

For example,

**[1..]**is the infinite list of all positive integers. You can safely use it, as long as you only use a finite amount of it. For example, if you try**uptoSqrt n = [x | x <- [1..], x * x <= n]**and then ask for

**uptoSqrt 25**you will get back 1, 2, 3, 4, and 5, and then the function call will hang. You and I realize that

**\x -> x * x**is monotonically increasing for positive numbers, but Haskell doesn't, and obligingly tries to check the infinitely many integers greater than 5 to see whether any have squares no greater than 25. That takes forever, so it never finishes.To get the result you want, you will have to write something like

**uptoSqrt n = [x | x <- (takeWhile (\y -> y * y <= n) [1..])]**What's

**takeWhile**? Here's a declaration for it:**takeWhile :: (a -> Bool) -> [a] -> [a]**Give it a function that takes an a and returns a Bool, and a list of as, and it will hand you back a prefix of that list--all the items up to, but not including, the first one for which the function returns false.

With that definition,

**uptoSqrt 25**gives you the

**[1,2,3,4,5]**you expect, because we know that [1..] is in ascending order and the square function is monotonically increasing.

(There's nothing magic about

**takeWhile**; if you ask for**takeWhile even [2, 4..]**it will print even numbers forever, because

**[2, 4..]**is the infinite list of positive even integers, and hence**takeWhile**will never find an odd number to make it stop.)That means that I can simply define the whole infinite list of "fair and square" numbers. To make it possible to get those in a certain range, I will have to generate them in ascending order, so back comes the sort that we omitted for a while. I'm going to do something that clashes a bit with a Haskell convention for patterns in function definitions, because it matches the nomenclature of the problem discussion, in which the "fair and square" numbers are referred to as X and the palindromes they are squares of as Y.

**possibles nDigits =****if nDigits == 1 then [0..3]**

else if nDigits == 2 then [11, 22]

else sort (concat [noTwos nDigits, oneTwo nDigits, twoTwos nDigits])else if nDigits == 2 then [11, 22]

else sort (concat [noTwos nDigits, oneTwo nDigits, twoTwos nDigits])

**ys = concat [possibles n | n <- [1..]]****xs = [y * y | y <- ys]****palSquares m n = takeWhile (<= n) (dropWhile (< m) xs)****dropWhile**is kind of like**takeWhile**, but it gives you a suffix of the list you hand it, namely everything from the first item for which the function returns**false**on. If you drop the values less than m from the front of a sorted list, and then grab the values <= n from the front of what's left, you have precisely those values between m and n inclusive. (I'm being a bit free with my language here, of course. Haskell doesn't destroy the original list--at least not if anyone's looking.)Note that we aren't using

**floorSqrt**or**ceilSqrt**any more, so they can go away, and we thus aren't counting bits any more. We're defining the whole list, so we aren't counting decimal digits, either, and**digitsIn**can go away as well.So do infinite data structures provide any advantage? Well, consider that I have yet to actually solve the Code Jam problem as such. Remember that it requires reading a file with ranges of values m and n for which we have to print a line of a given format with the value of

**length (palSquares m n)**Defining the infinite list of Xs is a form of

*memoization*, remembering the values of a function you calculate so you don't have to do the work if you invoke it again with the same arguments. Suppose two of the input lines are**200 1000****40 500**Evaluating

**palSquares 200 1000**will evaluate all the Xs between 0 and 1000 (actually one more past that, because you have to find the stopping place), so that when it comes time to process the next line, all the values between 40 and 500 will have already been determined, so all we have to do is select the values in range.

It's not what you'd want for serious memoization; after all, the input lines

*could*all be within a small interval of the upper bound, so we would arguably have wasted the determination of all those lesser "fair and square" numbers. But it does have the potential of saving time. Let's see what happens with the Code Jam inputs (and with jagd's version of main)...Cool! Of course, I didn't submit these officially--I'm using a portion of someone else's solution, and peeked at the analysis of the problem, and took far longer to write the program than the rules permit, so it would in no way be fair to take credit--but Google let me download the inputs, and confirmed that the output generated is correct. Here's the time output, first for large input 1, 10K lines with ranges a subset of [1, 10

^{14}]:**real 0m0.229s**

user 0m0.220s

sys 0m0.008suser 0m0.220s

sys 0m0.008s

and then for large input 2, 1K lines with a range of [1, 10

^{100}]:**real 0m1.796s**

user 0m1.784s

sys 0m0.012suser 0m1.784s

sys 0m0.012s

Not shabby for time--and to refer back to Mr. Reeder's blog post, I'm sure that with those times, an interpreted version would have no trouble running in the eight minute limit Code Jam imposes--but how about memory? Time to instrument again... and sure enough, keeping that list, or perhaps those lists, around eats RAM. With large input 1, it uses 60+ kilobytes of RAM (the graph generated doesn't show the scale all the way up the Y (RAM) axis), and with large input 2, it uses about five megabytes of RAM. Even the five megabytes is small change these days, and now I suspect I wasn't looking closely at which run I was checking memory usage for last time around. I bet I thought a 10

^{100 }run was a 10^{14 }run, and I shouldn't have blamed the sort at all.UPDATE: I'm truly amazed at the brevity of jagd's solution; you really should check it out. I will have a good time studying it further. Just for the heck of it, I compiled it with ghc -O2, and tried it on the large input files. For large input 1,

**real 0m0.226s**

user 0m0.220s

sys 0m0.008suser 0m0.220s

sys 0m0.008s

For large input 2,

**real 0m14.410s**

user 0m14.200s

sys 0m0.184suser 0m14.200s

sys 0m0.184s

I wonder what it's doing differently to make it run 63.8 times slower on the larger input rather than 7.84 times slower?

UPDATE: part of it is the way it's keeping the list of Ys both in numeric and string form, with lots of conversions back and forth. That means more memory; on large input 2, jagd's program uses over 55 Mbytes of RAM, with a very interesting shape to the plot of RAM versus time.

I see that jagd also at least thinks he or she is generating some values that might not be valid Ys, given the

**filter (ispalindrome . square)**applied to potential Ys. Maybe that's part of the RAM usage fluctuation.

("." is function composition in Haskell, and

**filter**has a signature that should be familiar to you from**takeWhile**:**filter :: (a -> Bool) -> [a] -> [a]****filter**takes a function and a list and hands you back all the items in the list for which the function returns true.)
## Comments