Aging Nerd Notes

Tuesday, May 26, 2020

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list.

It turns out that the word "mackerel" has a curious property: there is exactly one of the fifty states that it has no letters in common with. So we will call a word with that property "a mackerel". There are quite a few of them. (It doesn't matter which state name is the unique one disjoint from the word. The state for "mackerel" is Ohio, but for "goldfish" it's Kentucky.)

For purposes of this problem, the permissible words are those in Peter Norvig's word list.

So... the Riddler poses the following questions:

Which mackerel (or mackerels) are the longest?
For extra credit, which state has the most mackerels?

I'd pose an extra question: are there any mackerel-less states?

The last Riddler using that list took some work to make efficient, so I was a bit worried... but I didn't need to be.

I've been doing more C than Python lately, so after some Googling to remind myself of some things, I ended up with this:

class DefaultEmpty(dict):

def __missing__(self, key):

self[key] = rv = []

return rv

def setDict(wordFile):

with open(wordFile, "r") as f:

return {w:set(w) for w in f.read().splitlines()}

def isMackerel(wordSet, stateDict):

nonOverlapper = None

for state, stateSet in stateDict.items():

if stateSet.isdisjoint(wordSet):

if nonOverlapper:

return None

else:

nonOverlapper = state

return nonOverlapper

def mackerels(stateFile, wordFile):

stateDict = setDict(stateFile)

wordDict = setDict(wordFile)

mackerelDict = {}

stateMackerels = DefaultEmpty()

for word, wordSet in wordDict.items():

state = isMackerel(wordSet, stateDict)

if state:

mackerelDict[word] = state

stateMackerels[state].append(word)

return mackerelDict, stateMackerels

First we have the usual dictionary with a default value, in this case the empty list.

Next, a function that takes a file and generates a map from lines of the file to the sets created from them. (splitlines() pulls the newline characters, thank goodness! Otherwise we'd see no mackerels at all.)

Next, isMackerel(), which, given the set made from a word and the state dictionary, will return the appropriate state if the word is a mackerel and return None if the word isn't a mackerel.

Finally, what we really want: something that takes a file of state names (we were lazy and created the file with all lower case and spaces mashed out, e.g. "westvirginia") and the word file and creates two dictionaries: mackerelDict, which maps mackerels to their states, and stateMackerels, which maps states to their lists of mackerels--note that any state with no mackerels won't be a key of stateMackerels.

With all that, all we need to do is...

from mackerel import mackerels
mackerelDict, stateMackerels = mackerels("states.txt", "word_list.txt")
mackerelsByLength = sorted(mackerelDict.keys(), key=len, reverse=True)
statesByMackerelCount = sorted(stateMackerels.keys(), key=lambda s: len(stateMackerels[s]), reverse=True)

and we have what we need to answer the questions.

>>> [len(s) for s in mackerelsByLength[0:5]]
[23, 23, 21, 21, 20]
>>> mackerelsByLength[0:2]
['counterproductivenesses', 'hydrochlorofluorocarbon']

So, there are two 23-character mackerels, and they're the longest using this word list.

>>> statesByMackerelCount[0]
'ohio'
>>> len(stateMackerels["ohio"])
11342

Ohio has the most mackerels, 11,342 of them.

>>> len(stateMackerels.keys())
32

Eighteen states have no mackerels at all! Shame on us; we should've returned stateDict.keys() so we could easily get the names of the mackerel-less states. We do know the ones with mackerels:

>>> stateMackerels.keys()
['wyoming', 'colorado', 'alaska', 'tennessee', 'iowa', 'nevada', 'maine', 'mississippi', 'newjersey', 'oklahoma', 'delaware', 'illinois', 'indiana', 'maryland', 'texas', 'wisconsin', 'newyork', 'michigan', 'kansas', 'utah', 'virginia', 'oregon', 'connecticut', 'montana', 'newmexico', 'vermont', 'hawaii', 'kentucky', 'northdakota', 'missouri', 'ohio', 'alabama']

How long did all this take to run? The slowest part was running mackerels(), which took a bit over three seconds on my AMD FX-8370; nothing else seemed to take human-perceptible time.

Monday, February 10, 2020

Riddler Classic, Feb 7, 2020

Sometimes I can figure out the solution to either the Riddler Express or Riddler Classic. This week is the first time I'm reasonably confident I have them both.

It helped that Riddler Express was really, really easy this time. Riddler Classic, though, takes some more thought.

It started with a math professor, James Propp, whose child asked about the vertical bars used in math notation for absolute value. Propp initially said "They're just like parentheses so there isn't any ambiguity" but then realized that there is ambiguity. |-1|-2|-3| could be either

|-1| - 2 * |-3| = 1 - 2 * 3 = -5 or
|-1 * |-2| -3 = |-1 * 2 - 3| = |-2 - 3| = 5

...but wait! It could also be |-1| * -2 * |-3| = 1 * -2 * -3 = -6, so perhaps there's more ambiguity than the Riddler and Professor Propp realize. What's the source of this additional ambiguity?

'-' can be unary minus or it can be subtraction
in math notation, multiplication is expressed via concatenation: "xy" is x times y.

*NOTE*: See the update below. Professor Propp pointed out my mistaken interpretation. I'll leave my erroneous code here--I was happy to have a problem that made me think seriously despite being sick as the proverbial dog, and I'll make a point of remembering the part about iterating over range(2^n).

If we were using programming language notation, an "open vertical bar" would be written "abs(" and a "close vertical bar" would be written ")". We can generate the valid expressions corresponding to |-1|-2|-3|-4|-5|-6|-7|-8|-9| by generating the validly nested balanced strings of parentheses.

def rawCombinations(n):

# generate all n-character strings with n / 2 open parens and

# n / 2 close parens to correspond to vertical bars that surround

# an expression we're taking the absolute value of. We only really

# care about balanced strings, so we always start with ( and end

# with ). For the Riddler problem, n = 10.

for c in combinations(range(n - 2), (n - 2) // 2):

yield f"({''.join('(' if i in c else ')' for i in range(n - 2))})"

def isValid(s):

# return True iff the input string has correctly nested parentheses

nestLevel = 0

for c in s:

nestLevel += 1 if c == '(' else -1

if nestLevel < 0:

return False

return nestLevel == 0

Now, what about the digits? It turns out that what you do depends on the surrounding parentheses:

for "((", insert "-n *"
for "()", insert "-n"
for ")(*, either insert " - n * " or "* -n * "
for "))", insert "* -n"

So for a valid string of parentheses, there's some number of occurrences of ")(" in it; call it n. Then we know there are 2^n possible expressions corresponding to that parenthesis string. I thought about recursion, but then realized that was silly. The numbers from 0 to 2^n - 1 correspond to a set of choices for each ")(", so it's a simple iterator:

def exprs(s):

numAmbiguous = 0

for i in range(len(s)-1):

if s[i:i+2] == ')(':

numAmbiguous += 1

for i in range(2**numAmbiguous):

digit = 1

ambiguousIndex = 0

result = ''

for j in range(len(s)-1):

pair = s[j:j+2]

if pair[0] == '(':

result += 'abs('

elif pair[0] == ')':

result += ')'

if pair == ')(':

if i & (1 << ambiguousIndex):

result += f' * {-digit} * '

else:

result += f' - {digit} * '

ambiguousIndex += 1

elif pair == '((':

result += f'{-digit} * '

elif pair == '))':

result += f' * {-digit}'

elif pair == '()':

result += f'{-digit}'

digit += 1

result += ')'

yield result

Python, like a lot of "scripting" languages, has an eval() function you can pass a string to and get back a value, so all that's left is

def calcValues():

# get the possible expressions, evaluate them, and build a dictionary

# mapping value to the list of expressions that evaluate to it. (I'd

# just keep a count, but I'm curious and having the expressions might

# be good for testing.)

c = defaultdict(list)

for s in filter(isValid, rawCombinations(10)):

for e in exprs(s):

c[eval(e)].append(e)

return c

You'll need

from itertools import combinations

from collections import defaultdict

to make Python happy.

That done...

Python 3.7.6 (default, Jan 8 2020, 19:59:22)
[GCC 7.3.0] :: Anaconda custom (64-bit) on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from riddlerClassic020720 import calcValues
>>> foo = calcValues()
>>> len(foo)
103

There's our answer. Just out of idle curiosity...

>>> sorted(list(foo))
[-362880, -362879, -181584, -181296, -175392, -60486, -60485, -60474, -57456, -30264, -30263, -30216, -30215, -17064, -17063, -16848, -14112, -13392, -11376, -5112, -5111, -3144, -3143, -3050, -3049, -3038, -2966, -2904, -2903, -2508, -1902, -1901, -1890, -1728, -1727, -924, -923, -918, -917, -906, -468, -467, -288, -287, -234, -233, -140, -139, -128, -56, 6, 7, 114, 124, 142, 162, 216, 390, 450, 460, 498, 726, 762, 763, 774, 862, 1224, 1944, 2446, 2450, 2790, 2904, 2905, 2998, 2999, 3010, 3082, 3144, 3145, 4104, 4968, 4969, 8530, 11376, 11377, 15096, 15106, 15134, 15144, 17064, 28512, 28513, 57456, 60426, 60474, 60475, 60486, 60534, 175392, 181438, 181442, 362880, 362881]

If you're like me, you're seriously tired of all those, as the Riddler elegantly put it, "those annoying memes about order of operations", but I have to admit that when I took a look at

>>> foo[7]
['abs(-1) - 2 * abs(-3) * -4 * abs(-5) - 6 * abs(-7) - 8 * abs(-9)']

I spent a little time convincing myself that yes, that really does evaluate to 7.

UPDATE: as mentioned above, it turns out that there's really only one choice for ')(', namely '- n *'. That simplifies exprs(), which is now just expr() and returns a value rather than yielding a sequence. In passing, we made calcValues() take the number of | as a parameter so we could easily try smaller cases. Here it is...

# Ridddler Classic for 02/07/2020
# How many values can the ambiguous expression
# |-1|-2|-3|-4|-5|-6|-7|-8|-9|
# evaluate to?

from itertools import combinations
from collections import defaultdict

def rawCombinations(n):
# generate all n-character strings with n / 2 open parens and
# n / 2 close parens to correspond to vertical bars that surround
# an expression we're taking the absolute value of. We only really
# care about balanced strings, so we always start with ( and end
# with ). For the Riddler problem, n = 10.
for c in combinations(range(n - 2), (n - 2) // 2):
yield f"({''.join('(' if i in c else ')' for i in range(n - 2))})"

def isValid(s):
# return True iff the input string has correctly nested parentheses
nestLevel = 0
for c in s:
nestLevel += 1 if c == '(' else -1
if nestLevel < 0:
return False
return nestLevel == 0

def expr(s):
# Convert a balanced, properly nested string of parentheses to an
# evaluatable expression. Upon reflection, we don't think there's
# the additional ambiguity we thought existed.
digit = 1
result = ''
for j in range(len(s)-1):
pair = s[j:j+2]
if pair[0] == '(':
result += 'abs('
elif pair[0] == ')':
result += ')'
if pair == ')(':
result += f' - {digit} * '
elif pair == '((':
result += f'{-digit} * '
elif pair == '))':
result += f' * {-digit}'
elif pair == '()':
result += f'{-digit}'
digit += 1
return result + ')'

def calcValues(n):
# get the possible expressions, evaluate them, and build a dictionary
# mapping value to the list of expressions that evaluate to it. (I'd
# just keep a count, but I'm curious and having the expressions might
# be good for testing.)
c = defaultdict(list)
for s in filter(isValid, rawCombinations(n)):
e = expr(s)
c[eval(e)].append(e)

return c

After which...

>>> foo = calcValues(10)
>>> len(foo)
42
>>> sorted(foo)
[-362879, -60485, -60474, -30215, -17063, -5111, -3143, -3049, -3038, -2966, -2904, -1901, -1890, -923, -917, -906, -467, -287, -233, -139, -128, -56, 6, 114, 124, 142, 216, 390, 450, 460, 726, 1944, 2446, 2790, 4968, 8530, 15096, 15106, 28512, 60426, 181438, 362880]
>>> sorted(len(foo[s]) for s in foo)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>>

so that there are really only 42 values, and no two of the expressions evaluate to the same number.

UPDATE(2), of the serious forehead slap flavor: I should've realized sooner that what is now expr() could be expressed much more simply:

fmtStr = {')(':') - {} * ', '((':'abs(-{} * ', '))':') * -{}', '()':'abs(-{}'}

def expr(s):
# Convert a balanced, properly nested string of parentheses to an
# evaluatable expression..

return ''.join(fmtStr[s[i:i+2]].format(i+1) for i in range(len(s)-1)) + ')'

We had to drop back to using str.format(), because f-strings aren't strings. You might think that you can have a dictionary with f-strings as values, but it won't work the way you might think. Also, at a minor cost in DRYness, we folded the leading character mapping into the format string.

UPDATE(3): I didn't quite fully excise my misconception. Changing fmtStr to

fmtStr = {')(':') - {} * ',
'((':'abs(-{} * ',
'))':') -{}', # pulled the '*'
'()':'abs(-{}'}

gives the proper result.

Thursday, June 27, 2019

A flashback and analogy

You've probably heard about how the notion of sum types (e.g. Algol 68 unions, Rust enums, Haskell types) and product types (e.g. tuples or structs) generalizes, e.g. [a] can be thought of as ${a^0} + {a^1} + {a^2} + {a^3} + ...$ where the exponent corresponds to the length of the list.

This morning I had a flashback from that to my days in introductory numerical analysis and Taylor series, Chebyshev polynomials, and other infinite bases for function spaces. Just as evaluating, e.g., exp(x) can be done by evaluating the first N terms of the corresponding Taylor series where N may vary depending on x but is nonetheless finite, in Haskell, as long as you only evaluate finitely many elements of an "infinite" list, you can do what you need to do.

Friday, June 21, 2019

I wish I were a dog

As I've mentioned recently, I'm now in a Meetup for people in the Des Moines, IA area where we're seriously doing the problems and paying close attention to the text, so I noticed this function intended to map your dog's age in years to a corresponding age in people years, while also introducing the beginning Haskell programmer to guards:

dogYrs :: Integer -> Integer
dogYrs x
| x <= 0 = 0
| x <= 1 = x * 15
| x <= 2 = x * 12
| x <= 4 = x * 8
| otherwise = x * 6

Looking at it, you can see that yes, dogs mature faster than people but the rate tapers off with increasing age (but still faster than people). OK...but then something looked wrong. A function like this ought to be monotonically increasing, right?

map dogYrs [0..5]
[0, 15, 24, 24, 32, 30]

So dogs don't age in human years between 2 and 3, and get younger in human years between 4 and 5? (If myDog took and returned types in Fractional, things would get even weirder, because then dogs would celebrate their second birthday plus epsilon by getting almost three years younger in human years, and their fourth birthday plus epsilon by getting almost eight years younger in human years.)

As a human, I feel cheated. As a person looking to get better at Haskell, I think I'd better write a quickCheck test that a function is monotonically increasing.

Monday, June 10, 2019

"Assignment" versus parameter passing

It's been a while. Still looking for work, but I'm happy to say that there is now a Des Moines area meetup for people learning Haskell, so that I'm going through Haskell Programming from First Principles more seriously this time and doing all the exercises.

Thanks to that, I realize that one can express the one well-behaved function of type a -> b -> a as curry fst, but I am confused by one problem (so far). It's the first of a set that has the student modify a type signature and see how it affects the binding of a particular expression to the variable whose type signature is changed. Initially we have

i :: Num a => a
i = 1

and the change is to

i :: a
i = 1

The first one clearly works; Num a => a is the type of numerical constants (without decimal points) in Haskell. The second, if you try it, fails, with ghci nudging you to put back the typeclass constraint... but at this point my time working on compilers comes back to me, and I recall the following: semantically, passing a parameter to a function is assignment. You assign the actual parameter to the corresponding formal parameter (really, you put it where the ABI says the parameter should go for the called function to get to it).

So... if you can't assign 1 to i when i's type signature is a, why can you pass 1 to id? Haskell must treat the two differently. Looks like I just have to find out what the difference is.

Tuesday, January 30, 2018

Trailing zeroes

A week or so ago someone mentioned the following problem:

How many trailing zeroes does 1000! have [when written out in base ten, of course]?

Easy enough to find...

Prelude> length $ takeWhile (== '0') $ reverse (show (product [1..1000]))
249
Prelude>

But is there a better way? Yes, there is. After reading that someone said he did it in his head, the way to do it in your head occurred to me. *slaps forehead*

1000!, like any positive integer, has a unique prime factorization. In particular, we're interested in the primes 5 and 2, because 10 is 5 * 2. They certainly both appear in the prime factorization of 1000!, and they thus have positive exponents in said factorization. Let m be the exponent for 5, and let n be the exponent for 2. How to calculate them? Let's start with five.

There are 200 multiples of 5 between 1 and 1000.
There are 40 multiples of 25 between 1 and 1000.
There are 8 multiples of 125 between 1 and 1000.
There's 1 multiple of 625 between 1 and 1000.

You recognize, of course, that 5, 25, 125, and 625 are positive powers of 5. The multiples of 625 contribute four to m. The multiples of 125 contribute 3... but one of them is 625, so we don't want to count things more than once. Ditto for multiples of 25 and multiples of 5... but there's an easy way to make sure you get the right count: just add 200, 40, 8, and 1. That counts multiples of 25 twice, as they should be, multiples of 125 three times, and multiples of 625 four times. The sum: 249. Looks familiar.

So, how about 2?

There are 500 multiples of 2 between 1 and 1000...

Hey, wait a minute. That's already way bigger than 249, so the minimum of m and n is exactly 249, and that's our answer, because it takes a five and a two to have a ten in the product, and there are exactly as many trailing zeroes as there are powers of 10 in 1000!

Wednesday, September 27, 2017

Sum of multiples problem

One of the exercism.io problems generalizes a Project Euler problem: what's the sum of multiples of 3 and 5 less than 1000? Seems simple enough:

sum [3, 6..999] + sum [5, 10..999]

but that's too big; it counts some values twice. With a little thought, though, you'll realize that the duplicates are the multiples of 15, so

sum [3, 6..999] + sum [5, 10..999] - sum[15, 30..999]

will do the trick. Why 15? Well, it's 3 * 5... but more on that later.

The exercism.io problem makes the upper bound a parameter, and allows an array (or list, depending on the language) of factors. Looking around (which you can do on the site after you've submitted your own solution), one sees two typical approaches:

Go through [1..limit - 1], pick out the values that are multiples of one or more of the factors, and add up the result.
Create a set or hash table from [[f, 2*f..limit - 1] f <- factors] (thus getting rid of the duplicates), and add up the elements of the set/keys of the hash table.

Both work, but they have drawbacks:

Divisibility checking means using the relatively expensive mod, potentially repeatedly, for each value in [1..limit - 1], and as many times as possible for values that aren't summed. As limit grows, that overhead grows.
Sets and hash tables build up an underlying data structure that grows with each value added to it. As limit grows, so does RAM usage. (My Python version using this method, handed an upper bound much larger than the exercism.io test values and set up to time with Python's timeit package, quickly sucked down immense amounts of virtual memory and had to be killed.)

There ought to be a way to generalize what we did for the initial [3, 5] case... and indeed there is. What did we do? We took the sum of the multiples of each factor and then subtracted off the sum of the duplicates. If there's just one factor, that sum is zero. If there are two, it's the sum of the multiples of... the product of the factors? Not quite!

Suppose that our factors are 6 and 21. 6 * 21 is 126, but 7 * 6 == 2 * 21 == 42. We want not the product, but the least common multiple of the two factors. 3 and 5 are relatively prime, so their least common multiple is their product (lcm m n == m * n `div` (gcd m n), and m and n are relatively prime iff their greatest common divisor is 1.) Try to fake us out, eh?

More generally, if you have factors f1, f2, ..., fn, each fi gives rise to {lcm fi fj | i < j} as new sets of factors whose multiples are duplicates of the multiples of fi--but those in turn have duplicate issues, so it's time to recur. You won't recur forever, because the sets of factors get smaller each time.

One more thing... we don't need to evaluate sum [f, f+f, .. limit - 1] at all. There's a better way to do it, and since Young Sheldon had its premiere this week, let's return to the 19th century and Young Carl to learn why we don't have to generate that arithmetic sequence or count on GHC to do fusion. Young Carl Friedrich Gauss was assigned the problem of adding up all the integers from one to one hundred (in the first version I read, the class was assigned the problem, and only after solving it could a student go out to recess). The teacher looked, but Carl wasn't busily adding numbers. Soon he wrote down a number and handed the paper in, and to the teacher's amazement it was correct. The math book I learned it from explained Gauss's insight this way: consider the numbers from 1 to 100 written down twice: once in ascending order, and immediately under that in descending order, and consider corresponding terms in the two sequences. The first, 1 and 100, add up to 101. To get from one pair to the next, the top increases by 1 and the bottom decreases by one, so the sum remains the same: 101. There are 100 such pairs, so collectively they add up to 100 * 101 = 10100... but remember, we wrote the sequence down twice, so that sum is twice the value we want. The number young Carl wrote was 5050, and in general, the sum of the integers from 1 to n is n * (n + 1) / 2, and as an immediate consequence the sum of the multiples of f less than limit is f * n * (n + 1) `div` 2 where n = (limit - 1) `div` f.

Here's what I ended up with. Haskell has both gcd and lcm functions, I'm happy to say.

sumOfMultiples :: [Integer] -> Integer -> Integer
sumOfMultiples factors limit =
    sum (map oneFactorSum factors) -
        sum (map (`sumOfMultiples` limit) (dupFactors factors))
    where sumOneTo     n  = n * (n + 1) `div` 2
          oneFactorSum f  = f * sumOneTo ((limit - 1) `div` f)
          dupFactors   fs = map lcmHead $ take (length fs - 1) (iterate tail fs)
          lcmHead      fs = [lcm (head fs) f | f <- tail fs]