Tuesday, May 26, 2020

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list.

It turns out that the word "mackerel" has a curious property: there is exactly one of the fifty states that it has no letters in common with. So we will call a word with that property "a mackerel". There are quite a few of them. (It doesn't matter which state name is the unique one disjoint from the word. The state for "mackerel" is Ohio, but for "goldfish" it's Kentucky.)

For purposes of this problem, the permissible words are those in Peter Norvig's word list.

So... the Riddler poses the following questions:
  • Which mackerel (or mackerels) are the longest?
  • For extra credit, which state has the most mackerels?
I'd pose an extra question: are there any mackerel-less states?

The last Riddler using that list took some work to make efficient, so I was a bit worried... but I didn't need to be.

I've been doing more C than Python lately, so after some Googling to remind myself of some things, I ended up with this:


class DefaultEmpty(dict):
    def __missing__(self, key):
        self[key] = rv = []
        return rv

def setDict(wordFile):
    with open(wordFile, "r") as f:
        return {w:set(w) for w in f.read().splitlines()}

def isMackerel(wordSet, stateDict):
    nonOverlapper = None
    for state, stateSet in stateDict.items():
        if stateSet.isdisjoint(wordSet):
        if nonOverlapper:
            return None
        else:
            nonOverlapper = state
    return nonOverlapper

def mackerels(stateFile, wordFile):
    stateDict = setDict(stateFile)
    wordDict = setDict(wordFile)
    mackerelDict = {}
    stateMackerels = DefaultEmpty()
    for word, wordSet in wordDict.items():
        state = isMackerel(wordSet, stateDict)
        if state:
            mackerelDict[word] = state
            stateMackerels[state].append(word)
    return mackerelDict, stateMackerels

First we have the usual dictionary with a default value, in this case the empty list.

Next, a function that takes a file and generates a map from lines of the file to the sets created from them. (splitlines() pulls the newline characters, thank goodness! Otherwise we'd see no mackerels at all.)

Next, isMackerel(), which, given the set made from a word and the state dictionary, will return the appropriate state if the word is a mackerel and return None if the word isn't a mackerel.

Finally, what we really want: something that takes a file of state names (we were lazy and created the file with all lower case and spaces mashed out, e.g. "westvirginia") and the word file and creates two dictionaries: mackerelDict, which maps mackerels to their states, and stateMackerels, which maps states to their lists of mackerels--note that any state with no mackerels won't be a key of stateMackerels.

With all that, all we need to do is...

from mackerel import mackerels
mackerelDict, stateMackerels = mackerels("states.txt", "word_list.txt")
mackerelsByLength = sorted(mackerelDict.keys(), key=len, reverse=True)
statesByMackerelCount = sorted(stateMackerels.keys(), key=lambda s: len(stateMackerels[s]), reverse=True)

and we have what we need to answer the questions.

>>> [len(s) for s in mackerelsByLength[0:5]]
[23, 23, 21, 21, 20]
>>> mackerelsByLength[0:2]
['counterproductivenesses', 'hydrochlorofluorocarbon']

So, there are two 23-character mackerels, and they're the longest using this word list.

>>> statesByMackerelCount[0]
'ohio'
>>> len(stateMackerels["ohio"])
11342


Ohio has the most mackerels, 11,342 of them.

>>> len(stateMackerels.keys())
32


Eighteen states have no mackerels at all! Shame on us; we should've returned stateDict.keys() so we could easily get the names of the mackerel-less states. We do know the ones with mackerels:

>>> stateMackerels.keys()
['wyoming', 'colorado', 'alaska', 'tennessee', 'iowa', 'nevada', 'maine', 'mississippi', 'newjersey', 'oklahoma', 'delaware', 'illinois', 'indiana', 'maryland', 'texas', 'wisconsin', 'newyork', 'michigan', 'kansas', 'utah', 'virginia', 'oregon', 'connecticut', 'montana', 'newmexico', 'vermont', 'hawaii', 'kentucky', 'northdakota', 'missouri', 'ohio', 'alabama']

How long did all this take to run? The slowest part was running mackerels(), which took a bit over three seconds on my AMD FX-8370; nothing else seemed to take human-perceptible time.

Monday, February 10, 2020

Riddler Classic, Feb 7, 2020

Sometimes I can figure out the solution to either the Riddler Express or Riddler Classic. This week is the first time I'm reasonably confident I have them both.

It helped that Riddler Express was really, really easy this time. Riddler Classic, though, takes some more thought.

It started with a math professor, James Propp, whose child asked about the vertical bars used in math notation for absolute value. Propp initially said "They're just like parentheses so there isn't any ambiguity" but then realized that there is ambiguity. |-1|-2|-3| could be either

  • |-1| - 2 * |-3| = 1 - 2 * 3 = -5 or
  • |-1 * |-2| -3 = |-1 * 2 - 3| = |-2 - 3| = 5
...but wait! It could also be |-1| * -2 * |-3| = 1 * -2 * -3 = -6, so perhaps there's more ambiguity than the Riddler and Professor Propp realize. What's the source of this additional ambiguity?
  • '-' can be unary minus or it can be subtraction
  • in math notation, multiplication is expressed via concatenation: "xy" is x times y.
*NOTE*: See the update below. Professor Propp pointed out my mistaken interpretation.  I'll leave my erroneous code here--I was happy to have a problem that made me think seriously despite being sick as the proverbial dog, and I'll make a point of remembering the part about iterating over range(2^n).

If we were using programming language notation, an "open vertical bar" would be written "abs(" and a "close vertical bar" would be written ")".  We can generate the valid expressions corresponding to |-1|-2|-3|-4|-5|-6|-7|-8|-9| by generating the validly nested balanced strings of parentheses.


def rawCombinations(n):
    # generate all n-character strings with n / 2 open parens and
    # n / 2 close parens to correspond to vertical bars that surround
    # an expression we're taking the absolute value of. We only really
    # care about balanced strings, so we always start with ( and end
    # with ). For the Riddler problem, n = 10.
    for c in combinations(range(n - 2), (n - 2) // 2):
        yield f"({''.join('(' if i in c else ')' for i in range(n - 2))})"


def isValid(s):
    # return True iff the input string has correctly nested parentheses
    nestLevel = 0
    for c in s:
        nestLevel += 1 if c == '(' else -1
        if nestLevel < 0:
            return False
    return nestLevel == 0

Now, what about the digits? It turns out that what you do depends on the surrounding parentheses:
  • for "((", insert "-n *"
  • for "()", insert "-n"
  • for ")(*, either insert " - n * " or "* -n * "
  • for "))", insert "* -n"
So for a valid string of parentheses, there's some number of occurrences of ")(" in it; call it n.  Then we know there are 2^n possible expressions corresponding to that parenthesis string. I thought about recursion, but then realized that was silly. The numbers from 0 to 2^n - 1 correspond to a set of choices for each ")(", so it's a simple iterator:

def exprs(s):
    numAmbiguous = 0
    for i in range(len(s)-1):
        if s[i:i+2] == ')(':
            numAmbiguous += 1
        
    for i in range(2**numAmbiguous):
        digit = 1
        ambiguousIndex = 0
        result = ''
        for j in range(len(s)-1):
            pair = s[j:j+2]
            if pair[0] == '(':
                result += 'abs('
            elif pair[0] == ')':
                result += ')'
            if pair == ')(':
                if i & (1 << ambiguousIndex):
                    result += f' * {-digit} * '
                else:
                    result += f' - {digit} * '
                ambiguousIndex += 1
            elif pair == '((':
                result += f'{-digit} * '
            elif pair == '))':
                result += f' * {-digit}'
            elif pair == '()':
                result += f'{-digit}'
            digit += 1
        result += ')'
        yield result

Python, like a lot of "scripting" languages, has an eval() function you can pass a string to and get back a value, so all that's left is

def calcValues():
    # get the possible expressions, evaluate them, and build a dictionary
    # mapping value to the list of expressions that evaluate to it. (I'd
    # just keep a count, but I'm curious and having the expressions might
    # be good for testing.)
    c = defaultdict(list)
    for s in filter(isValid, rawCombinations(10)):
        for e in exprs(s):
            c[eval(e)].append(e)
    return c

You'll need

from itertools   import combinations
from collections import defaultdict

to make Python happy.

That done...

Python 3.7.6 (default, Jan  8 2020, 19:59:22)  
[GCC 7.3.0] :: Anaconda custom (64-bit) on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from riddlerClassic020720 import calcValues
>>> foo = calcValues()
>>> len(foo)
103


There's our answer. Just out of idle curiosity...

>>> sorted(list(foo))
[-362880, -362879, -181584, -181296, -175392, -60486, -60485, -60474, -57456, -3
0264, -30263, -30216, -30215, -17064, -17063, -16848, -14112, -13392, -11376, -5112, -5111, -3144, -3143, -3050, -3049, -3038, -2966, -2904, -2903, -2508, -1902, -1901, -1890, -1728, -1727, -924, -923, -918, -917, -906, -468, -467, -288, -287, -234, -233, -140, -139, -128, -56, 6, 7, 114, 124, 142, 162, 216, 390, 450, 460, 498, 726, 762, 763, 774, 862, 1224, 1944, 2446, 2450, 2790, 2904, 2905, 2998, 2999, 3010, 3082, 3144, 3145, 4104, 4968, 4969, 8530, 11376, 11377, 15096, 15106, 15134, 15144, 17064, 28512, 28513, 57456, 60426, 60474, 60475, 60486, 60534, 175392, 181438, 181442, 362880, 362881]

If you're like me, you're seriously tired of all those, as the Riddler elegantly put it, "those annoying memes about order of operations", but I have to admit that when I took a look at

>>> foo[7]
['abs(-1) - 2 * abs(-3) * -4 * abs(-5) - 6 * abs(-7) - 8 * abs(-9)']

I spent a little time convincing myself that yes, that really does evaluate to 7.

UPDATE: as mentioned above, it turns out that there's really only one choice for ')(', namely '- n *'. That simplifies exprs(), which is now just expr() and returns a value rather than yielding a sequence. In passing, we made calcValues() take the number of | as a parameter so we could easily try smaller cases. Here it is...

# Ridddler Classic for 02/07/2020
# How many values can the ambiguous expression
#  |-1|-2|-3|-4|-5|-6|-7|-8|-9|
# evaluate to?

from itertools   import combinations
from collections import defaultdict

def rawCombinations(n):
    # generate all n-character strings with n / 2 open parens and
    # n / 2 close parens to correspond to vertical bars that surround
    # an expression we're taking the absolute value of. We only really
    # care about balanced strings, so we always start with ( and end
    # with ). For the Riddler problem, n = 10.
    for c in combinations(range(n - 2), (n - 2) // 2):
        yield f"({''.join('(' if i in c else ')' for i in range(n - 2))})"

        
def isValid(s):
    # return True iff the input string has correctly nested parentheses
    nestLevel = 0
    for c in s:
        nestLevel += 1 if c == '(' else -1
        if nestLevel < 0:
            return False
    return nestLevel == 0


def expr(s):
    # Convert a balanced, properly nested string of parentheses to an
    # evaluatable expression. Upon reflection, we don't think there's
    # the additional ambiguity we thought existed.
    digit = 1
    result = ''
    for j in range(len(s)-1):
        pair = s[j:j+2]
        if pair[0] == '(':
            result += 'abs('
        elif pair[0] == ')':
            result += ')'
        if pair == ')(':
            result += f' - {digit} * '
        elif pair == '((':
            result += f'{-digit} * '
        elif pair == '))':
            result += f' * {-digit}'
        elif pair == '()':
            result += f'{-digit}'
        digit += 1
    return result + ')'


def calcValues(n):
    # get the possible expressions, evaluate them, and build a dictionary
    # mapping value to the list of expressions that evaluate to it. (I'd
    # just keep a count, but I'm curious and having the expressions might
    # be good for testing.)
    c = defaultdict(list)
    for s in filter(isValid, rawCombinations(n)):
        e = expr(s)
        c[eval(e)].append(e)

    return c

After which...

>>> foo = calcValues(10)
>>> len(foo)
42
>>> sorted(foo)
[-362879, -60485, -60474, -30215, -17063, -5111, -3143, -3049, -3038, -2966, -2904, -1901, -1890, -923, -917, -906, -467, -287, -233, -139, -128, -56, 6, 114, 124, 142, 216, 390, 450, 460, 726, 1944, 2446, 2790, 4968, 8530, 15096, 15106, 28512, 60426, 181438, 362880]
>>> sorted(len(foo[s]) for s in foo)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>>

so that there are really only 42 values, and no two of the expressions evaluate to the same number.

UPDATE(2), of the serious forehead slap flavor: I should've realized sooner that what is now expr() could be expressed much more simply:

fmtStr = {')(':') - {} * ', '((':'abs(-{} * ', '))':') * -{}', '()':'abs(-{}'}

def expr(s):
    # Convert a balanced, properly nested string of parentheses to an
    # evaluatable expression..

    return ''.join(fmtStr[s[i:i+2]].format(i+1) for i in range(len(s)-1)) + ')'

We had to drop back to using str.format(), because f-strings aren't strings. You might think that you can have a dictionary with f-strings as values, but it won't work the way you might think.  Also, at a minor cost in DRYness, we folded the leading character mapping into the format string.

UPDATE(3): I didn't quite fully excise my misconception. Changing fmtStr to

fmtStr = {')(':') - {} * ', 
          '((':'abs(-{} * ',
          '))':') -{}',       # pulled the '*'
          '()':'abs(-{}'}

gives the proper result.

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...