Tuesday, May 26, 2020

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list.

It turns out that the word "mackerel" has a curious property: there is exactly one of the fifty states that it has no letters in common with. So we will call a word with that property "a mackerel". There are quite a few of them. (It doesn't matter which state name is the unique one disjoint from the word. The state for "mackerel" is Ohio, but for "goldfish" it's Kentucky.)

For purposes of this problem, the permissible words are those in Peter Norvig's word list.

So... the Riddler poses the following questions:
  • Which mackerel (or mackerels) are the longest?
  • For extra credit, which state has the most mackerels?
I'd pose an extra question: are there any mackerel-less states?

The last Riddler using that list took some work to make efficient, so I was a bit worried... but I didn't need to be.

I've been doing more C than Python lately, so after some Googling to remind myself of some things, I ended up with this:


class DefaultEmpty(dict):
    def __missing__(self, key):
        self[key] = rv = []
        return rv

def setDict(wordFile):
    with open(wordFile, "r") as f:
        return {w:set(w) for w in f.read().splitlines()}

def isMackerel(wordSet, stateDict):
    nonOverlapper = None
    for state, stateSet in stateDict.items():
        if stateSet.isdisjoint(wordSet):
        if nonOverlapper:
            return None
        else:
            nonOverlapper = state
    return nonOverlapper

def mackerels(stateFile, wordFile):
    stateDict = setDict(stateFile)
    wordDict = setDict(wordFile)
    mackerelDict = {}
    stateMackerels = DefaultEmpty()
    for word, wordSet in wordDict.items():
        state = isMackerel(wordSet, stateDict)
        if state:
            mackerelDict[word] = state
            stateMackerels[state].append(word)
    return mackerelDict, stateMackerels

First we have the usual dictionary with a default value, in this case the empty list.

Next, a function that takes a file and generates a map from lines of the file to the sets created from them. (splitlines() pulls the newline characters, thank goodness! Otherwise we'd see no mackerels at all.)

Next, isMackerel(), which, given the set made from a word and the state dictionary, will return the appropriate state if the word is a mackerel and return None if the word isn't a mackerel.

Finally, what we really want: something that takes a file of state names (we were lazy and created the file with all lower case and spaces mashed out, e.g. "westvirginia") and the word file and creates two dictionaries: mackerelDict, which maps mackerels to their states, and stateMackerels, which maps states to their lists of mackerels--note that any state with no mackerels won't be a key of stateMackerels.

With all that, all we need to do is...

from mackerel import mackerels
mackerelDict, stateMackerels = mackerels("states.txt", "word_list.txt")
mackerelsByLength = sorted(mackerelDict.keys(), key=len, reverse=True)
statesByMackerelCount = sorted(stateMackerels.keys(), key=lambda s: len(stateMackerels[s]), reverse=True)

and we have what we need to answer the questions.

>>> [len(s) for s in mackerelsByLength[0:5]]
[23, 23, 21, 21, 20]
>>> mackerelsByLength[0:2]
['counterproductivenesses', 'hydrochlorofluorocarbon']

So, there are two 23-character mackerels, and they're the longest using this word list.

>>> statesByMackerelCount[0]
'ohio'
>>> len(stateMackerels["ohio"])
11342


Ohio has the most mackerels, 11,342 of them.

>>> len(stateMackerels.keys())
32


Eighteen states have no mackerels at all! Shame on us; we should've returned stateDict.keys() so we could easily get the names of the mackerel-less states. We do know the ones with mackerels:

>>> stateMackerels.keys()
['wyoming', 'colorado', 'alaska', 'tennessee', 'iowa', 'nevada', 'maine', 'mississippi', 'newjersey', 'oklahoma', 'delaware', 'illinois', 'indiana', 'maryland', 'texas', 'wisconsin', 'newyork', 'michigan', 'kansas', 'utah', 'virginia', 'oregon', 'connecticut', 'montana', 'newmexico', 'vermont', 'hawaii', 'kentucky', 'northdakota', 'missouri', 'ohio', 'alabama']

How long did all this take to run? The slowest part was running mackerels(), which took a bit over three seconds on my AMD FX-8370; nothing else seemed to take human-perceptible time.

No comments:

Riddler Classic, May 23, 2020—Holy Mackerel!

Another one using Peter Norvig's word list . It turns out that the word "mackerel" has a curious property: there is exactly ...