OK. I really will go on to doing something else in Haskell; I'm thinking of something that will actually get me into the M-word.
(Though in the back of my mind I will still wonder whether there's a way to make justOnes generate values in ascending order other than just sorting the darn things...)
I want to give a little more data.
I went back and compiled and ran the C solution to the "Fair and Square" problem I'd grabbed, to get time output so I could remember something other than just the vague "about .4 seconds". The variance seems to be less than it is for "my" Haskell program, (I have to use the scare quotes given my the integer square root and semilattice search tree code) so I'll just give one output, as usual from the data set with values up to a googol, from the middle:
real 0m0.455s
user 0m0.432s
sys 0m0.020s
In comparison, the best value from a Haskell run:
real 0m0.224s
user 0m0.176s
sys 0m0.044s
If I had to guess, I'd say maybe the difference in sys time is due to memory allocation; the C program visits the heap for buffers for I/O for input and output files, but that's about it. The rest it does via fixed-size declared arrays.
How much memory, then? What dominates for the C program is a single array, a two-dimensional 50000 x 200 array of ints. That comes to 24 MB. The graph generated from profiling the current version of the Haskell program shows it using somewhere between 6.25 and 6.5 MB (I really wish they'd show the scale all the way up on that). So half the time, and given the other arrays in the C code, I think it's safe to say about a quarter of the RAM.
The C code definitely wins on size of compiled program, though; with both stripped, it's 860K for Haskell, a hair over 10K for the C program. To be sure, that's taking advantage of shared libraries. I wonder whether ghc pays much attention to only including what's necessary in a compiled program? (Or maybe it's my doing, and I should be more specific about what I import.)
random notes and thoughts, mostly about Haskell these days, of a rather past middle-aged programmer
Saturday, June 08, 2013
Thursday, June 06, 2013
Skating around the edges
Still thinking a bit about how to churn out those palindromes in order, or at least take advantage of what order there is. If you take a look at justOnes, it may be that each of the k values gives a hunk in ascending order, so you may be able to foldl' merge [] them and get them in order for cheaper... (UPDATE: no such luck) though that may not actually be cheaper than a single O(n log n) sort.
But now we take another peek at the output. The slowness of piecing together Strings to generate output has been noticed... oh, yes, it has been noticed... and people have come up with a way to cut down on it: using shows or something like it.
shows returns a function that takes a String as parameter and returns a String--the returned String is a printable representation of the thing (which must be in the ShowS type class) with the String parameter appended at the end. Instead of generating the String, you create a function that will generate it when called.
So.. remember
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
? Well, now we have
showsResult :: (Int, Int) -> String -> String
showsResult (c, n) = ("Case #" ++) . shows c . (": " ++) . shows n
The use of "sections" makes the correspondence clear. Since we're now passing along functions instead of strings, they're used a little differently:
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
becomes
main = do
s <- B.getContents
let r = map showsResult $ zip [1..] (map process (tail $ B.lines s))
mapM putStr $ map ($ "\n") r
Now, putStr, if you look it up, has the type String -> IO (), but it has a reassuring name. mapM, though, hints at the dreaded "M" word... monad. [insert dramatic chipmunk sound clip here]. All it is, though, is a monad-flavored map. Before, r was a [String]; now it's a [String -> String]. To get the [String], we use map with a section. (In Haskell, $ is the function application operator. We've used it before; its low priority makes it attractive as a way to avoid parentheses, but here, it's more than a syntactic convenience.) The "\n" means we don't have to use unlines, mapM putStr just runs putStr on each of the Strings on the list.
Does it help? Well... where time output is concerned, it's a wash. The profiler output shows it taking .01 second longer. Ah, but the memory graph! The -hc graph shows the format version somewhere around 7 MB, but the showsResult version at maybe 6.25 MB. That's got to count for something.
But now we take another peek at the output. The slowness of piecing together Strings to generate output has been noticed... oh, yes, it has been noticed... and people have come up with a way to cut down on it: using shows or something like it.
shows returns a function that takes a String as parameter and returns a String--the returned String is a printable representation of the thing (which must be in the ShowS type class) with the String parameter appended at the end. Instead of generating the String, you create a function that will generate it when called.
So.. remember
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
? Well, now we have
showsResult :: (Int, Int) -> String -> String
showsResult (c, n) = ("Case #" ++) . shows c . (": " ++) . shows n
The use of "sections" makes the correspondence clear. Since we're now passing along functions instead of strings, they're used a little differently:
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
becomes
main = do
s <- B.getContents
let r = map showsResult $ zip [1..] (map process (tail $ B.lines s))
mapM putStr $ map ($ "\n") r
Now, putStr, if you look it up, has the type String -> IO (), but it has a reassuring name. mapM, though, hints at the dreaded "M" word... monad. [insert dramatic chipmunk sound clip here]. All it is, though, is a monad-flavored map. Before, r was a [String]; now it's a [String -> String]. To get the [String], we use map with a section. (In Haskell, $ is the function application operator. We've used it before; its low priority makes it attractive as a way to avoid parentheses, but here, it's more than a syntactic convenience.) The "\n" means we don't have to use unlines, mapM putStr just runs putStr on each of the Strings on the list.
Does it help? Well... where time output is concerned, it's a wash. The profiler output shows it taking .01 second longer. Ah, but the memory graph! The -hc graph shows the format version somewhere around 7 MB, but the showsResult version at maybe 6.25 MB. That's got to count for something.
Tuesday, June 04, 2013
"Blessed rage for order, pale Ramon..."
So, the big cost center is nDigitYs. All it does is concatenate our three flavors of Y (the ones with no 2s, the ones with one 2, and the ones with two 2s) and sort them. Of the d-digit Ys, there are at most two with two 2s, and there are d `div` 2 with one 2. By far the majority are those with no 2s; for d == 49, of the 4,122 Ys, 4,122 - 2 - 24 = 4,096 have no 2s.
We know that the two-2 Ys are greater than all the others, so we could go with
nDigitYs n = sort (noTwos n ++ oneTwos n) ++ twoTwos n
...for what little good that does.
If you take a look at the result of oneTwos, you'll find that they're always in ascending order; we grind them out so that the optional 1 is first not there, and then in ascending order in the most signfiicant half, which suffices to order them. noTwos, on the other hand, isn't, for d > 7. If there's some way to generate them in ascending order, we could go with
nDigitYs n = merge (noTwos n) (oneTwos n) ++ twoTwos n
(be sure to import Data.List.Utils) and O(n) beats O(n log n) any day. Unfortunately, I don't know of a good way to do that.
We know that the two-2 Ys are greater than all the others, so we could go with
nDigitYs n = sort (noTwos n ++ oneTwos n) ++ twoTwos n
...for what little good that does.
If you take a look at the result of oneTwos, you'll find that they're always in ascending order; we grind them out so that the optional 1 is first not there, and then in ascending order in the most signfiicant half, which suffices to order them. noTwos, on the other hand, isn't, for d > 7. If there's some way to generate them in ascending order, we could go with
nDigitYs n = merge (noTwos n) (oneTwos n) ++ twoTwos n
(be sure to import Data.List.Utils) and O(n) beats O(n log n) any day. Unfortunately, I don't know of a good way to do that.
Sunday, June 02, 2013
OK, I have to take one more look...
What can I say? I can sympathize with Lot's wife.
I have yet to show you the spiffy graphic you can get showing memory usage.
What does this tell us?
First, we can say that it looks like we're down about a megabyte from earlier. (Alas, they don't give the scale all the way up the vertical axis; if only they'd move the word "bytes" out of the way!) The other thing is, it looks like it's accumulating thunks until about halfway through when something finally demands evaluation and the RAM soufflé collapses.
(What's a thunk? The name goes back to the days when the world was new and people were figuring out how to generate code for Algol 60 programs. When the defining committee set up the rules for "call by name" they perhaps didn't realize the full consequences of their actions (vide Jensen's Device). Implementing it required generating a little function for each call by name parameter that returned a reference to it, and those little functions were called "thunks". In the context of Haskell and other lazy programming languages, a thunk represents a value that hasn't yet been evaluated.)
Real World Haskell has a chapter on optimization that we've leaned heavily on, and it has a section on memory usage,. thunks, and strictness issues. I will look more closely at that section and take one last shot at the palindrome program--after all, these issues will come up in the future as well. (No, really, this will be the last time...)
I have yet to show you the spiffy graphic you can get showing memory usage.
What does this tell us?
First, we can say that it looks like we're down about a megabyte from earlier. (Alas, they don't give the scale all the way up the vertical axis; if only they'd move the word "bytes" out of the way!) The other thing is, it looks like it's accumulating thunks until about halfway through when something finally demands evaluation and the RAM soufflé collapses.
(What's a thunk? The name goes back to the days when the world was new and people were figuring out how to generate code for Algol 60 programs. When the defining committee set up the rules for "call by name" they perhaps didn't realize the full consequences of their actions (vide Jensen's Device). Implementing it required generating a little function for each call by name parameter that returned a reference to it, and those little functions were called "thunks". In the context of Haskell and other lazy programming languages, a thunk represents a value that hasn't yet been evaluated.)
Real World Haskell has a chapter on optimization that we've leaned heavily on, and it has a section on memory usage,. thunks, and strictness issues. I will look more closely at that section and take one last shot at the palindrome program--after all, these issues will come up in the future as well. (No, really, this will be the last time...)
After the smoke clears...
OK. I think I've taken this problem as far as I can go... though someday I will set it up to read and write named files so I can run it in ghci. After all, this all started because I wanted to demonstrate that it's the algorithm, not the language or whether it's interpreted or compiled after reading Clif Reeder's "Ruby is Too Slow for Programming Competitions". On the other hand, I now have a Haskell program that's within spitting distance of running twice as fast as a C++ solution for the problem, taking a bit over 0.2 seconds to run the hairier official input file. Eight minutes, the time Code Jam gives you to run and send back results when it counts, is 480 seconds; I tend to doubt that interpreter overhead would come near a factor of 2,400, in which case the point is made.
This started with the less-than-noble motivation of "Well, I'll show him...", but quickly changed, for a couple of reasons.
Here's the result. You'll need the semilattice code, which is available at the link above. I compiled with -O2 in the interests of speed.
{-
OK, here's what I hope is the ultimate "Fair and Square" program.
(See https://code.google.com/codejam/contest/2270488/dashboard#s=p2;
the core of the problem is, given two positive integers, M and N, determine
the number of base 10 palindromes between M and N inclusive that are squares
of base 10 palindromes.)
Best time output, reading the L2 input file, on a 2.8 GHz AMD Athlon II
X4 630:
real 0m0.227s
user 0m0.208s
sys 0m0.016s
[Ultimate? Silly me; there's always something else to try. However, I think
I'm done with it, for now at least.]
The main insights to speed up the process follow.
(4)-(6) are directly from the Google Code Jam analysis of the problem at
https://code.google.com/codejam/contest/2270488/dashboard#s=a&a=2
(9) is from Twan van Laarhoven's "Search trees without sorting", on his blog
at http://twanvl.nl/blog/haskell/SemilatticeSearchTree,
1. Look for palindromes in [ceil(sqrt(M))..floor(sqrt(N))] whose squares are
palindromes. (We'll refer to one of the former set as X, and one of the
latter set as Y, as is done in Google's analysis.)
2. Don't look for palindromes, generate them. They're determined (aside from
the middle digit if there are an odd number) completely by half their
digits, so in a range of 10^(2*n) integers, there are on the order of
10^n palindromes, a reduction on the order of (1).
3. If Y^2 is to be a palindome, its most/least significant digit can't be
greater than 3.
4. In particular, the Ys are precisely those that, when you square them via
long multiplication, have no carries in the addition of partial products.
5. (3) implies that the Xs have an odd number of digits.
6. (3) also implies that the middle digit of X, which is the sum of the
squares of the digits of the palindrome being squared, must be no larger
than 9. Therefore, *none* of the digits of Y can be greater than 3, and
3 only appears in Y = 3.
7. (6) implies that Ys come in three flavors: those with two 2s, those with
one 2, and those with no 2s.
8. Save for very small n, it's easier to count d-digit Ys than to generate
them--and remember, the problem only asks for the count.
9. By treating a binary tree as a sublattice, labeling branches with the
upper bounds of the corresponding subtrees, one can quickly search for
the first value in the tree >= a specified value.
10. By labelling the sorted Ys for a given number of digits with their ordinal
positions, one can use those tree searches to quickly determine how many
Ys are in a given interval.
NOTE: 0 is a one digit palindrome, and its square is, too, but since we are
only dealing with positive integers, we'll ignore zero.
-}
import Prelude
import Data.List
import SemilatticeSearchTree
import qualified Data.ByteString.Char8 as B
-- It's easy to calculate the number of n-digit Ys.
numNDigitYs :: Int -> Int
numNDigitYs n = if n == 1 then 3 else numTwoTwos n + numOneTwos n + numNoTwos n
where numTwoTwos d = if even d then 1 else 2
numOneTwos d = if even d then 0 else d `div` 2
numNoTwos d = let d' = d `div` 2 - 1
n' = sum [d' `choose` i | i <- [0..min d' 3]]
in if even d then n' else 2 * n'
choose :: Int -> Int -> Int
m `choose` 0 = 1
m `choose` n = product [m - n + 1..m] `div` product [1..n]
-- We may wish to memoize numNDigitYs...
ysInDigitRange d1 d2 = sum [numNDigitYs d | d <- [d1..d2]]
{-
Now the slow part: generating and counting Ys in a range other than "all
d-digit numbers".
-}
powersOfTen = [10 ^ n | n <- [0..]] -- or, if you wish, map (10 ^) [0..]
tenToThe :: Int -> Integer
tenToThe n = powersOfTen !! n
twoTwos :: Int -> [Integer]
twoTwos n
| n == 1 = []
| even n = [twoTwosNoOnes]
| otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe (n `div` 2)]
where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
oneTwos :: Int -> [Integer]
oneTwos n
| n == 1 = [2]
| even n = []
| otherwise = [p + 2 * tenToThe halfN
| p <- justOnes n (min 1 (halfN - 1))]
where halfN = n `div` 2
noTwos :: Int -> [Integer]
noTwos n
| n == 1 = [1,3]
| even n = base
| otherwise = concat [[p, p + tenToThe halfN] | p <- base]
where halfN = n `div` 2
base = justOnes n (min 3 (halfN - 1))
{-
justOnes -- generate the palindromes with a specified number of digits all of
whose non-zero digits are 1s. 1 must be the most/least significant digit; up
to a specified number of 1s are scattered through the rest of each half.
If the number of digits requested is odd, we leave a 0 in the middle digit,
for the caller to either replace with another digit or not.
-}
justOnes :: Int -> Int -> [Integer]
justOnes n o =
let halfN = n `div` 2
shell = tenToThe (n - 1) + 1
innards = concat [(halfN - 1) `choices` k | k <- [0..o]]
-- originally pair i = tenToThe i + tenToThe (n - (i + 1))
ascending = take (halfN - 1) (drop 1 powersOfTen)
descending = take (halfN - 1) (reverse (take (n - 1) powersOfTen))
memoPair = zipWith (+) ascending descending
pair i = memoPair !! (i - 1)
in [foldl' (+) shell (map pair c) | c <- innards]
choices :: Int -> Int-> [[Int]]
m `choices` n
| n == 0 = [[]]
| m == n = [[m, m-1..1]]
| otherwise = [m : c | c <- (m - 1) `choices` (n - 1)]
++ ((m - 1) `choices` n)
nDigitYs :: Int -> [Integer]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
-- We turn the sorted lists from nDigitYs into search trees.
-- Here we use van Laarhoven's semilattice search tree routines/types.
-- Sigh... we're doing a bit of cargo cult programming here, because even
-- though we'll never search on the ordinal index, we'll follow the blog
-- post example.
yTree :: Int -> SearchTree (Max Integer, Max Int)
yTree n = fromList [(Max x, Max y) | (x, y) <- zip (nDigitYs n) [0..]]
yTrees = map yTree [1..]
dDigitYTree :: Int -> SearchTree (Max Integer, Max Int)
dDigitYTree d = yTrees !! (d - 1)
-- ...leading up to this function.
ysInRange :: Integer -> Integer -> Int -> Int
ysInRange m n d
| nVal == n = nPos - mPos + 1
| otherwise = nPos - mPos
where (mVal, mPos) = findFirst' m
(nVal, nPos) = findFirst' n
findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
Just (Max i, Max j) -> (i, j)
Nothing -> (tenToThe d, numNDigitYs d)
-- counting decimal digits and bits
-- the numbers here are big enough to nudge us past the obvious
powersOfTwo = map (2^) [0..]
bigTwos = map (2^) powersOfTwo
bigTens = map (10^) powersOfTwo
bitsIn n = bDigits n bigTwos
digitsIn n = bDigits n bigTens
bDigits :: Integer -> [Integer] -> Int
bDigits n xs = bDigits' n 1
where bDigits' n s = let upToN = takeWhile (<= n) xs
in case upToN of
[] -> s
_ -> bDigits' (n `div` (last upToN))
(s + powersOfTwo !! (length upToN - 1))
-- The following is derived from a response to a Stack Overflow question
-- http://stackoverflow.com/questions/1623375/writing-your-own-square-root-function
-- which in turn is taken from Crandall & Pomerance, "Prime Numbers: A
-- Computational Perspective".
floorSqrt :: Integer -> Integer
floorSqrt 0 = 0
floorSqrt n = floorSqrt' (2 ^ ((1 + bitsIn n) `div` 2))
where floorSqrt' x =
let y = (x + n `div` x) `div` 2
in if y >= x then x else floorSqrt' y
-- then the obvious...
ceilSqrt :: Integer -> Integer
ceilSqrt n =
let y = floorSqrt n
in if y * y == n then y else y + 1
-- Finally, the top level, which counts where it can, and generates, filters,
-- and counts where it must. Note that we break the interval down into ranges
-- that are subsets of d-digit numbers for appropriate values of d.
numYs :: Integer -> Integer -> Int
numYs m n
| mDigits == nDigits = ysInRange m n mDigits
| mPower10 && nPower10 = ysInDigitRange mDigits (nDigits - 1)
| mPower10 = ysInDigitRange mDigits (nDigits - 1)
+ ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
| otherwise = ysInRange m (tenToThe mDigits - 1) mDigits
+ ysInDigitRange (mDigits + 1) (nDigits - 1)
+ ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
where mDigits = digitsIn m
nDigits = digitsIn n
mPower10 = m == tenToThe (mDigits - 1)
nPower10 = n == tenToThe (nDigits - 1)
numXs :: Integer -> Integer -> Int
numXs m n = numYs (ceilSqrt m) (floorSqrt n)
-- A new main, heavily inspired by "Haskell I/O for Imperative Programmers"
-- http://www.haskell.org/haskellwiki/Haskell_IO_for_Imperative_Programmers
-- (What's with that tail, you ask? We're skipping that first line that says
-- how many more lines there are. They could have been persnickety and padded
-- the file with additional lines to make sure you have to pay attention to
-- that first line, but they weren't.
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
process :: B.ByteString -> Int
process line = case map B.readInteger (B.words line) of
[Just (m, _), Just (n, _)] -> numXs m n
_ -> -1
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
This started with the less-than-noble motivation of "Well, I'll show him...", but quickly changed, for a couple of reasons.
- The unalloyed pleasure of problem solving always wins.
- Haskell is a sheer delight to use.
Here's the result. You'll need the semilattice code, which is available at the link above. I compiled with -O2 in the interests of speed.
{-
OK, here's what I hope is the ultimate "Fair and Square" program.
(See https://code.google.com/codejam/contest/2270488/dashboard#s=p2;
the core of the problem is, given two positive integers, M and N, determine
the number of base 10 palindromes between M and N inclusive that are squares
of base 10 palindromes.)
Best time output, reading the L2 input file, on a 2.8 GHz AMD Athlon II
X4 630:
real 0m0.227s
user 0m0.208s
sys 0m0.016s
[Ultimate? Silly me; there's always something else to try. However, I think
I'm done with it, for now at least.]
The main insights to speed up the process follow.
(4)-(6) are directly from the Google Code Jam analysis of the problem at
https://code.google.com/codejam/contest/2270488/dashboard#s=a&a=2
(9) is from Twan van Laarhoven's "Search trees without sorting", on his blog
at http://twanvl.nl/blog/haskell/SemilatticeSearchTree,
1. Look for palindromes in [ceil(sqrt(M))..floor(sqrt(N))] whose squares are
palindromes. (We'll refer to one of the former set as X, and one of the
latter set as Y, as is done in Google's analysis.)
2. Don't look for palindromes, generate them. They're determined (aside from
the middle digit if there are an odd number) completely by half their
digits, so in a range of 10^(2*n) integers, there are on the order of
10^n palindromes, a reduction on the order of (1).
3. If Y^2 is to be a palindome, its most/least significant digit can't be
greater than 3.
4. In particular, the Ys are precisely those that, when you square them via
long multiplication, have no carries in the addition of partial products.
5. (3) implies that the Xs have an odd number of digits.
6. (3) also implies that the middle digit of X, which is the sum of the
squares of the digits of the palindrome being squared, must be no larger
than 9. Therefore, *none* of the digits of Y can be greater than 3, and
3 only appears in Y = 3.
7. (6) implies that Ys come in three flavors: those with two 2s, those with
one 2, and those with no 2s.
8. Save for very small n, it's easier to count d-digit Ys than to generate
them--and remember, the problem only asks for the count.
9. By treating a binary tree as a sublattice, labeling branches with the
upper bounds of the corresponding subtrees, one can quickly search for
the first value in the tree >= a specified value.
10. By labelling the sorted Ys for a given number of digits with their ordinal
positions, one can use those tree searches to quickly determine how many
Ys are in a given interval.
NOTE: 0 is a one digit palindrome, and its square is, too, but since we are
only dealing with positive integers, we'll ignore zero.
-}
import Prelude
import Data.List
import SemilatticeSearchTree
import qualified Data.ByteString.Char8 as B
-- It's easy to calculate the number of n-digit Ys.
numNDigitYs :: Int -> Int
numNDigitYs n = if n == 1 then 3 else numTwoTwos n + numOneTwos n + numNoTwos n
where numTwoTwos d = if even d then 1 else 2
numOneTwos d = if even d then 0 else d `div` 2
numNoTwos d = let d' = d `div` 2 - 1
n' = sum [d' `choose` i | i <- [0..min d' 3]]
in if even d then n' else 2 * n'
choose :: Int -> Int -> Int
m `choose` 0 = 1
m `choose` n = product [m - n + 1..m] `div` product [1..n]
-- We may wish to memoize numNDigitYs...
ysInDigitRange d1 d2 = sum [numNDigitYs d | d <- [d1..d2]]
{-
Now the slow part: generating and counting Ys in a range other than "all
d-digit numbers".
-}
powersOfTen = [10 ^ n | n <- [0..]] -- or, if you wish, map (10 ^) [0..]
tenToThe :: Int -> Integer
tenToThe n = powersOfTen !! n
twoTwos :: Int -> [Integer]
twoTwos n
| n == 1 = []
| even n = [twoTwosNoOnes]
| otherwise = [twoTwosNoOnes, twoTwosNoOnes + tenToThe (n `div` 2)]
where twoTwosNoOnes = 2 * tenToThe (n - 1) + 2
oneTwos :: Int -> [Integer]
oneTwos n
| n == 1 = [2]
| even n = []
| otherwise = [p + 2 * tenToThe halfN
| p <- justOnes n (min 1 (halfN - 1))]
where halfN = n `div` 2
noTwos :: Int -> [Integer]
noTwos n
| n == 1 = [1,3]
| even n = base
| otherwise = concat [[p, p + tenToThe halfN] | p <- base]
where halfN = n `div` 2
base = justOnes n (min 3 (halfN - 1))
{-
justOnes -- generate the palindromes with a specified number of digits all of
whose non-zero digits are 1s. 1 must be the most/least significant digit; up
to a specified number of 1s are scattered through the rest of each half.
If the number of digits requested is odd, we leave a 0 in the middle digit,
for the caller to either replace with another digit or not.
-}
justOnes :: Int -> Int -> [Integer]
justOnes n o =
let halfN = n `div` 2
shell = tenToThe (n - 1) + 1
innards = concat [(halfN - 1) `choices` k | k <- [0..o]]
-- originally pair i = tenToThe i + tenToThe (n - (i + 1))
ascending = take (halfN - 1) (drop 1 powersOfTen)
descending = take (halfN - 1) (reverse (take (n - 1) powersOfTen))
memoPair = zipWith (+) ascending descending
pair i = memoPair !! (i - 1)
in [foldl' (+) shell (map pair c) | c <- innards]
choices :: Int -> Int-> [[Int]]
m `choices` n
| n == 0 = [[]]
| m == n = [[m, m-1..1]]
| otherwise = [m : c | c <- (m - 1) `choices` (n - 1)]
++ ((m - 1) `choices` n)
nDigitYs :: Int -> [Integer]
nDigitYs n = sort (noTwos n ++ oneTwos n ++ twoTwos n)
-- We turn the sorted lists from nDigitYs into search trees.
-- Here we use van Laarhoven's semilattice search tree routines/types.
-- Sigh... we're doing a bit of cargo cult programming here, because even
-- though we'll never search on the ordinal index, we'll follow the blog
-- post example.
yTree :: Int -> SearchTree (Max Integer, Max Int)
yTree n = fromList [(Max x, Max y) | (x, y) <- zip (nDigitYs n) [0..]]
yTrees = map yTree [1..]
dDigitYTree :: Int -> SearchTree (Max Integer, Max Int)
dDigitYTree d = yTrees !! (d - 1)
-- ...leading up to this function.
ysInRange :: Integer -> Integer -> Int -> Int
ysInRange m n d
| nVal == n = nPos - mPos + 1
| otherwise = nPos - mPos
where (mVal, mPos) = findFirst' m
(nVal, nPos) = findFirst' n
findFirst' x = case findFirst (Ge x, Any) (dDigitYTree d) of
Just (Max i, Max j) -> (i, j)
Nothing -> (tenToThe d, numNDigitYs d)
-- counting decimal digits and bits
-- the numbers here are big enough to nudge us past the obvious
powersOfTwo = map (2^) [0..]
bigTwos = map (2^) powersOfTwo
bigTens = map (10^) powersOfTwo
bitsIn n = bDigits n bigTwos
digitsIn n = bDigits n bigTens
bDigits :: Integer -> [Integer] -> Int
bDigits n xs = bDigits' n 1
where bDigits' n s = let upToN = takeWhile (<= n) xs
in case upToN of
[] -> s
_ -> bDigits' (n `div` (last upToN))
(s + powersOfTwo !! (length upToN - 1))
-- The following is derived from a response to a Stack Overflow question
-- http://stackoverflow.com/questions/1623375/writing-your-own-square-root-function
-- which in turn is taken from Crandall & Pomerance, "Prime Numbers: A
-- Computational Perspective".
floorSqrt :: Integer -> Integer
floorSqrt 0 = 0
floorSqrt n = floorSqrt' (2 ^ ((1 + bitsIn n) `div` 2))
where floorSqrt' x =
let y = (x + n `div` x) `div` 2
in if y >= x then x else floorSqrt' y
-- then the obvious...
ceilSqrt :: Integer -> Integer
ceilSqrt n =
let y = floorSqrt n
in if y * y == n then y else y + 1
-- Finally, the top level, which counts where it can, and generates, filters,
-- and counts where it must. Note that we break the interval down into ranges
-- that are subsets of d-digit numbers for appropriate values of d.
numYs :: Integer -> Integer -> Int
numYs m n
| mDigits == nDigits = ysInRange m n mDigits
| mPower10 && nPower10 = ysInDigitRange mDigits (nDigits - 1)
| mPower10 = ysInDigitRange mDigits (nDigits - 1)
+ ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
| otherwise = ysInRange m (tenToThe mDigits - 1) mDigits
+ ysInDigitRange (mDigits + 1) (nDigits - 1)
+ ysInRange (tenToThe (nDigits - 1) + 1) n nDigits
where mDigits = digitsIn m
nDigits = digitsIn n
mPower10 = m == tenToThe (mDigits - 1)
nPower10 = n == tenToThe (nDigits - 1)
numXs :: Integer -> Integer -> Int
numXs m n = numYs (ceilSqrt m) (floorSqrt n)
-- A new main, heavily inspired by "Haskell I/O for Imperative Programmers"
-- http://www.haskell.org/haskellwiki/Haskell_IO_for_Imperative_Programmers
-- (What's with that tail, you ask? We're skipping that first line that says
-- how many more lines there are. They could have been persnickety and padded
-- the file with additional lines to make sure you have to pay attention to
-- that first line, but they weren't.
main = do
s <- B.getContents
let r = map format $ zip [1..] (map process (tail $ B.lines s))
putStr (unlines r)
process :: B.ByteString -> Int
process line = case map B.readInteger (B.words line) of
[Just (m, _), Just (n, _)] -> numXs m n
_ -> -1
format :: (Int, Int) -> String
format (lineNum, result) = "Case #" ++ show lineNum ++ ": " ++ show result
O(n) good, O(log n) better
What we need is a list/array of b, b2, b4, b8, b16, ... to count base b digits. The initial loop unrolling should have given us the idea already.
powersOfTwo = map (2^) [0..]
bigTwos = map (2^) powersOfTwo
bigTens = map (10^) powersOfTwo
bitsIn n = bDigits n bigTwos
digitsIn n = bDigits n bigTens
bDigits :: Integer -> [Integer] -> Int
bDigits n xs = bDigits' n 1
where bDigits' n s = let upToN = takeWhile (<= n) xs
in case upToN of
[] -> s
_ -> bDigits' (n `div` (last upToN))
(s + powersOfTwo !! (length upToN - 1))
Whew! OK, so does it help? Best observed time output:
real 0m0.227s
user 0m0.208s
sys 0m0.016s
Worst observed time output:
real 0m0.238s
user 0m0.212s
sys 0m0.024s
Profiling output claims 0.20 seconds runtime (still; the difference is a few "ticks" that aren't noticeable in the calculated time). It's a little better, but we are definitely in the realm of diminishing returns. Let's check out the cost centers:
COST CENTRE MODULE %time %alloc
nDigitYs Main 17.3 20.4
fromList.(...) SemilatticeSearchTree 15.7 21.6
justOnes Main 10.7 12.9
process Main 5.1 8.9
bDigits.bDigits' Main 5.1 2.1
fromList SemilatticeSearchTree 4.6 0.5
justOnes.pair Main 4.1 0.0
floorSqrt.floorSqrt'.y Main 4.1 4.5
choices Main 3.6 6.8
bDigits.bDigits'.upToN Main 3.6 2.8
meet SemilatticeSearchTree 3.0 0.0
choose Main 2.5 2.3
yTree Main 2.5 3.3
findFirst SemilatticeSearchTree 2.5 0.0
justOnes.innards Main 2.0 1.2
mkBranch SemilatticeSearchTree 2.0 2.5
main.r Main 1.5 0.3
noTwos Main 1.5 2.3
main Main 1.0 0.6
numYs Main 1.0 0.1
floorSqrt Main 1.0 0.7
satisfy SemilatticeSearchTree 1.0 0.0
satisfy SemilatticeSearchTree 1.0 0.6
meet SemilatticeSearchTree 0.0 2.8
Before, we had digitsIn and digitsIn.digitsIn', which collectively consumed 12.5% of the time and 9.3% of allocation. Now we have bDigits.bDigits' and bDigits.bDigits'.upToN, which just eat 8.7% of time and 4.9% of allocation (though speaking of allocation, of course we now have those three new lists sitting around).
UPDATE: It bugged me that I was grabbing a whole prefix of a list when I really only wanted the last item of that prefix (along with something like its length), so I added yet another helper function, resulting in
bDigits n xs = bDigits' n 1
where bDigits' n s = case lastLE n xs of
Nothing -> s
Just (m, p) -> bDigits' (n `div` m) (s + twoToThe p)
lastLE :: Integer -> [Integer] -> Maybe (Integer, Int)
lastLE n xs =
let lastLE' xs prevVal prevIndex
| head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
| otherwise = if prevIndex < 0 then Nothing
else Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)
where twoToThe is just like tenToThe, using powersOfTwo instead of powersOfTen.
That helped; with the change, the cost center listing shows bDigits, bDigits.bDigits', and lastLE.lastLE' collectively chowing down on just 3.3% of total time and 2.1% of allocation, which beats the heck out of 8.7 and 4.9. The -hc heap graph shows the peak at around 6.5 MB rather than the 8 it sailed up to when we went to the semilattice trees.
powersOfTwo = map (2^) [0..]
bigTwos = map (2^) powersOfTwo
bigTens = map (10^) powersOfTwo
bitsIn n = bDigits n bigTwos
digitsIn n = bDigits n bigTens
bDigits :: Integer -> [Integer] -> Int
bDigits n xs = bDigits' n 1
where bDigits' n s = let upToN = takeWhile (<= n) xs
in case upToN of
[] -> s
_ -> bDigits' (n `div` (last upToN))
(s + powersOfTwo !! (length upToN - 1))
Whew! OK, so does it help? Best observed time output:
real 0m0.227s
user 0m0.208s
sys 0m0.016s
Worst observed time output:
real 0m0.238s
user 0m0.212s
sys 0m0.024s
Profiling output claims 0.20 seconds runtime (still; the difference is a few "ticks" that aren't noticeable in the calculated time). It's a little better, but we are definitely in the realm of diminishing returns. Let's check out the cost centers:
COST CENTRE MODULE %time %alloc
nDigitYs Main 17.3 20.4
fromList.(...) SemilatticeSearchTree 15.7 21.6
justOnes Main 10.7 12.9
process Main 5.1 8.9
bDigits.bDigits' Main 5.1 2.1
fromList SemilatticeSearchTree 4.6 0.5
justOnes.pair Main 4.1 0.0
floorSqrt.floorSqrt'.y Main 4.1 4.5
choices Main 3.6 6.8
bDigits.bDigits'.upToN Main 3.6 2.8
meet SemilatticeSearchTree 3.0 0.0
choose Main 2.5 2.3
yTree Main 2.5 3.3
findFirst SemilatticeSearchTree 2.5 0.0
justOnes.innards Main 2.0 1.2
mkBranch SemilatticeSearchTree 2.0 2.5
main.r Main 1.5 0.3
noTwos Main 1.5 2.3
main Main 1.0 0.6
numYs Main 1.0 0.1
floorSqrt Main 1.0 0.7
satisfy SemilatticeSearchTree 1.0 0.0
satisfy SemilatticeSearchTree 1.0 0.6
meet SemilatticeSearchTree 0.0 2.8
Before, we had digitsIn and digitsIn.digitsIn', which collectively consumed 12.5% of the time and 9.3% of allocation. Now we have bDigits.bDigits' and bDigits.bDigits'.upToN, which just eat 8.7% of time and 4.9% of allocation (though speaking of allocation, of course we now have those three new lists sitting around).
UPDATE: It bugged me that I was grabbing a whole prefix of a list when I really only wanted the last item of that prefix (along with something like its length), so I added yet another helper function, resulting in
bDigits n xs = bDigits' n 1
where bDigits' n s = case lastLE n xs of
Nothing -> s
Just (m, p) -> bDigits' (n `div` m) (s + twoToThe p)
lastLE :: Integer -> [Integer] -> Maybe (Integer, Int)
lastLE n xs =
let lastLE' xs prevVal prevIndex
| head xs <= n = lastLE' (tail xs) (head xs) (prevIndex + 1)
| otherwise = if prevIndex < 0 then Nothing
else Just (prevVal, prevIndex)
in lastLE' xs (-1) (-1)
where twoToThe is just like tenToThe, using powersOfTwo instead of powersOfTen.
That helped; with the change, the cost center listing shows bDigits, bDigits.bDigits', and lastLE.lastLE' collectively chowing down on just 3.3% of total time and 2.1% of allocation, which beats the heck out of 8.7 and 4.9. The -hc heap graph shows the peak at around 6.5 MB rather than the 8 it sailed up to when we went to the semilattice trees.
A little more stuff
So, we know that things get slow when we actually have to generate some of the Ys for a range check. The range checking is now O(log n) where n is the number of Ys we have to check. twoTwos is trivial, so that leaves oneTwos and noTwos, and they mostly call justOnes. Of course there's also the sorting, the labeling with ordinal position, and the tree building. sort doesn't even appear in the .prof file, so it must be down in the noise (or perhaps considered a primitive; after all, we don't see the additions split out). What we do see in the "cost center" list are
[shell + sum (map pair c) | c <- innards]
It turns out to chop an easy 10% off the execution time, according to profiling output, to recast that as
[foldl' (+) shell (map pair c) | c <- innards]
"WTF?" you say? Well, folding a list takes several things:
a -> a -> a
Instead, you could have a starting value of type b and then the function would have type
b -> a -> b
which you'll see for things like fromList, which builds one data structure from another. Since we want shell plus a sum, we might as well put shell in as the starting value rather than using 0 and adding shell when the smoke clears.
Probably more important is the way that foldl' is strict. Laziness is Haskell's outstanding virtue, but there are times when you want strict, and avoiding bulding up potentially large stacks of thunks awaiting execution for a fold is one of those times.
With that one change, the profile files say the runtime drops from 0.23 s to 0.20 s. The new cost centers:
COST CENTRE MODULE %time %alloc
fromList.(...) SemilatticeSearchTree 18.4 20.6
nDigitYs Main 15.9 19.5
digitsIn.digitsIn' Main 10.0 9.1
justOnes Main 8.5 12.3
justOnes.pair Main 4.5 0.0
floorSqrt.floorSqrt'.y Main 4.0 4.3
fromList SemilatticeSearchTree 4.0 0.5
process Main 3.5 8.4
choices Main 3.5 6.5
findFirst SemilatticeSearchTree 3.0 0.0
digitsIn Main 2.5 0.2
satisfy SemilatticeSearchTree 2.5 0.0
meet SemilatticeSearchTree 2.5 2.7
justOnes.innards Main 2.0 1.1
floorSqrt Main 2.0 0.6
main Main 1.5 0.5
dDigitYTree Main 1.5 0.0
yTree Main 1.5 3.2
meet SemilatticeSearchTree 1.5 0.0
mkBranch SemilatticeSearchTree 1.5 2.4
noTwos Main 0.5 2.2
choose Main 0.0 2.2
Peeking at the call stacks shows that by far the majority of calls to digitsIn.digitsIn' take place in floorSqrt, where we're counting bits and the values passed range up to a googol, which means we're talking up to maybe 330-bit numbers (100 / .30103). Going eight at a time helped, but apparently not enough if we're now taking ten percent of our time counting bits.
- nDigitYs
- fromList
- digitsIn.digitsIn'
- justOnes
- justOnes.pair
- floorSqrt.floorSqrt'.y
[shell + sum (map pair c) | c <- innards]
It turns out to chop an easy 10% off the execution time, according to profiling output, to recast that as
[foldl' (+) shell (map pair c) | c <- innards]
"WTF?" you say? Well, folding a list takes several things:
- a function (we're using (+), the parentheses telling Haskell we're not trying to add a function to an Integer)
- a starting value (we're using shell)
- the list to be folded
a -> a -> a
Instead, you could have a starting value of type b and then the function would have type
b -> a -> b
which you'll see for things like fromList, which builds one data structure from another. Since we want shell plus a sum, we might as well put shell in as the starting value rather than using 0 and adding shell when the smoke clears.
Probably more important is the way that foldl' is strict. Laziness is Haskell's outstanding virtue, but there are times when you want strict, and avoiding bulding up potentially large stacks of thunks awaiting execution for a fold is one of those times.
With that one change, the profile files say the runtime drops from 0.23 s to 0.20 s. The new cost centers:
COST CENTRE MODULE %time %alloc
fromList.(...) SemilatticeSearchTree 18.4 20.6
nDigitYs Main 15.9 19.5
digitsIn.digitsIn' Main 10.0 9.1
justOnes Main 8.5 12.3
justOnes.pair Main 4.5 0.0
floorSqrt.floorSqrt'.y Main 4.0 4.3
fromList SemilatticeSearchTree 4.0 0.5
process Main 3.5 8.4
choices Main 3.5 6.5
findFirst SemilatticeSearchTree 3.0 0.0
digitsIn Main 2.5 0.2
satisfy SemilatticeSearchTree 2.5 0.0
meet SemilatticeSearchTree 2.5 2.7
justOnes.innards Main 2.0 1.1
floorSqrt Main 2.0 0.6
main Main 1.5 0.5
dDigitYTree Main 1.5 0.0
yTree Main 1.5 3.2
meet SemilatticeSearchTree 1.5 0.0
mkBranch SemilatticeSearchTree 1.5 2.4
noTwos Main 0.5 2.2
choose Main 0.0 2.2
Peeking at the call stacks shows that by far the majority of calls to digitsIn.digitsIn' take place in floorSqrt, where we're counting bits and the values passed range up to a googol, which means we're talking up to maybe 330-bit numbers (100 / .30103). Going eight at a time helped, but apparently not enough if we're now taking ten percent of our time counting bits.
Subscribe to:
Posts (Atom)
Riddler Classic, May 23, 2020—Holy Mackerel!
Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...
-
Back in the Cretaceous era I worked at the University of Oklahoma as a student assistant at Remote 1. OU was a [shudder] IBM big iron shop a...
-
You've probably heard about how the notion of sum types (e.g. Algol 68 union s, Rust enum s, Haskell type s) and product types (e.g. tup...
-
Verbal Abuse as Entertainment When I grew up, my parents always told me that there was a sort of person who needed to tear down others t...