The New York Times has come up with a new puzzle, called “Spelling Bee.” (Sorry, to use that link you apparently need a Times Crossword subscription, which comes bundled with many of their plans.) The idea is simple: they give you 7 letters, with one in the center. Here is today’s.
You are supposed to make as many words as you can. To count, a word must have at least 4 letters and must include the center letter. Four-letter words get one point, but longer words are worth one point per letter, and “pangrams,” which are words using all the letters, get an additional 7 points. Proper nouns and hyphenated words are out.
As you proceed, there is an ongoing, upbeat evaluation of how you are doing. At the start, with 0 points you are a, logically enough, a “beginner.” From there is moves through “good start,” “moving up,” “good,” “solid,” “nice,” “great,” “amazing,” and, in what sounds like a final destination, “genius.” When you hit genius, it gives you the option of stopping or continuing. If you choose to continue, and you manage to supply every Times-accepted word, you get the ultimate recognition:
Not that I would know, if I hadn’t cheated. The truth is: I suck at Spelling Bee. Claudia is really good and gets to “genius” on her own pretty much every day. I don’t think I’ve gotten beyond “nice.” Which is interesting to me, because I’m pretty good with words and I do well on crossword puzzles. I think I could get better, with practice, if I wanted to (more on that below), but maybe not that much better. Even so, Claudia has not yet achieved “queen bee” status on her own, that I know of. (I’m talking about in the game. Anyone who knows Claudia knows that she achieved queen bee status everywhere else long ago.)
So, to cut to the chase: I wrote a web app to cheat at Spelling Bee. You can find it at familykuhn.net/nat/spellingb.
Interestingly, there is quite a lot of empty space between “genius” and “queen bee.” I tracked this in obsessive detail only once, on May 9. I hit the 5th rung, “solid,” at 45 points. It took almost another hundred points to go three more steps to “amazing,” at 143. The next rung, “genius,” happened at 205, and I had to keep going to 280 to hit “queen bee.” The mileposts stretch further and further as the journey continues. I suspect they are awarded based on some percentage of the best possible score for a particular day, but that would be for another obsessive person to figure out, even I have my limits.
There are three things I find at least somewhat interesting about all this:
- What dictionary do they use to decide what’s a word and what isn’t? (Note: I wrote this question first, but my answer got somewhat tedious, even by own standards, so I moved it last.)
- How can you list all the possible solutions efficiently?
- Is it a good puzzle? What make a puzzle good, anyway?
And I’ll also add a local newspaper article I stumbled on a few years ago that suddenly has new relevance…
A big dataset, like a dictionary, needs some clever thinking. For example, how should you look up a word? The obvious answer is: go through all the words, one by one, and see if the word you want is in there. If you have a thousand words in your dictionary, the worst-case scenario is that you would need to check 999 words before matching on the thousandth. This known as a “brute-force” method. Is there a smarter way to do it?
The answer is yes: a process known as “binary search.” You don’t need to get freaked out by the word “binary”—there are no base-two calculations coming up. It just means you have a series of two-way choices. For example, let’s try to look something up in the “2of12inf” dictionary, described below. It has 81,883 total entries, so going through it one-by one is not an enticing process when done manually. Imagine you are walking down a road, and you hit a fork. In the middle of the fork is a sign, which says “liberates.” If that is the word you are looking for, you got extremely lucky and you got your answer on your first try: your word is in the dictionary. But that’s very unlikely. Suppose you want to look up “filially” (the adverbial form of “filial”). Because it comes before “liberates,” you take the left-hand fork, confident in your knowledge that the half of the words in the dictionary before “liberates” can be found there and only there. You walk a bit further and you come to a fork with sign saying “disguise.” Still not your word, but since your word comes after “disguise,” you take the right hand fork, which now leads to the half of the words on our current fork which are after “disguise,” which will be one-quarter of the words in the dictionary. You keep going until you either find your word or you come to the end of the road, that is, a place where there are no words left to compare. [Spoiler alert: “fillialy” is not in the dictionary. You would either get to “filial” and discover that there is no right turn, or to “filibuster” and discover that there is no left turn.] It occurred to me that there must be a good interactive example of this binary search algorithm on the web. They all involve a little more of a deep dive, but I particularly enjoyed VisuAlgo, with a runner up or two.
Each time you choose one fork over the other, the number of words ahead of you is cut in half (for us sticklers, very close to half). How many times can you cut 81,883 in half before you get to something less than 1? The answer turns out to be 17. This means that you will come to at most 17 forks in the road before you are guaranteed to get your answer. Doing 17 checks is a whole lot better than 81,883 (worst-case) or 40,000 or so on average.
If you have gotten this far, and this is new information to you, congratulations! You now understand an “algorithm.” And I hope it is clear that having a good algorithm (like binary search) makes certain problems do-able, which they may not have been if all you had was a bad algorithm (like brute-force search).
By the way, how do you know it takes at most 17 forks? Well, you could (a) take my word for it, (b) divide 81,883 by two, seventeen times (0.62….), or (c) compute the base-two logarithm of 81,883 (16.32…) and round up. OK, yes, I said no base-two calculations, but this is technically something different, plus it is optional.
In the case of the spelling bee game, I used a data structure that is a bit different from the “binary search tree” with half the words on either side of a “node.” I used a data structure called a “trie,” which I hadn’t heard of until I found this lovely post by Steve Hanov, while I was trying to figure out how to write my program.
The “root” node in the trie (apparently pronounced “tree” if you want to confuse people and “try,” if you don’t) has branches going to 26 nodes, depending on the first letter of your word. From there you have 26 further nodes, depending on the second letter. (Fewer than 26, actually: for example, if your word starts with “x,” there are only 3 possible choices for the second letter, “e,” “i,” and “y,” at least in the 2of12inf dictionary.)
The largest number of steps you need to go through in this process is whatever the length of your word is. And through the wonders of “recursion,” it is pretty easy to enumerate all the possible solutions. Take your list of allowable letters: “afilnty.” To search down the whole tree, start at your root node, take the “a” branch, and enumerate all the solutions from there. Then do the same for the “f” branch, and keep going one-by-one until you finish the “y” branch. Wait, what? Isn’t that circular reasoning? Well if it is circular reasoning, that is bad, and your program gets stuck in an infinite loop. But in this case it is not circular, it is “recursive.”
With our 2of12inf dictionary, we first go to the node that has all words starting with “a.” From there we follow to the node for all words starting with “aa.” From there we go to the node for all words starting with “aaa” except, wait! Our dictionary doesn’t have any. How about words starting “aaf.” Nope. “aai”? No. “aal”? “aan”? “aat”? “aay”? No, no, no, and no. OK, we are done with words starting “aa.” So no we go looking for words starting with “af.” Anything start with “afa”? And off we go. If you get to an actual word, you need to check that it includes the required center letter before you get too excited.
It takes a bit of thought to see that this will never get stuck in an infinite loop, because there is a limit to how long our dictionary words are. The beauty of recursion means that this whole process can be implemented in just a few lines of code.
Of course, to do this you need to put your dictionary into a “trie,” which means going through every single word. So if you are just doing one puzzle, you don’t save that much. But if you are doing a hundred, one after another, you have saved a huge amount of computing power! Well, some day the Times will publish a whole book of these, and we’ll be ready, right?
So is it a good puzzle?
I am sure that the fact that I don’t particularly like this puzzle is a classic example of sour grapes. So an honest list of why I don’t particularly like it would certain include #1.
- I’m not good at it.
- You need to be beyond-genius to actually complete a puzzle, which is where the little hit of dopamine for puzzle-solving comes from.
- “Completing” it depends on an unknown and sometimes arbitrary dictionary of allowable solutions.
- It gets harder as you go along, unlike, say a crossword puzzle which is easy at the beginning, gets harder, and then gets easier as you fill more stuff in.
- Doing well can take a long time.
What dictionary do they use?
Quick answer: I have no idea.
But even before that, how reasonable is their choice of what to include and exclude? I have my doubts.
For example, on May 9: how can they justify including “infantility” and “natality,” but exclude what seems to me to be the not-uncommon “flintily.” Why include “tinily” but exclude “tinnily?” I am having a hard time using “tinily” in a sentence, but maybe it’s because I’m distracted by my earbuds playing music tinnily. Why include “tali,” the plural of talus (ankle-bone), but exclude “ilia,” the plural of ilium (hip-bone) and “ilial.” I can understand including “lanai,” a darling of crossword enthusiasts, but excluding “latina”? Please.
Keep reading for a deep dive into what I discovered about dictionaries. Briefly: your mileage will vary a lot depending on you choice of dictionary. When you’ve had too much, skip to the cute ending. Here are the three dictionaries that I used:
- Official SCRABBLE Player’s Dictionary, 3rd ed (OSD3, below, and “s3” in the program output), which I also plundered from Steve Hanov.
- “2of12inf” from 12dicts. This is their dictionary that is supposed to be optimized for game playing. “12d” in the program output, followed by “!” if the word is flagged as a neologism, and “%” if it’s a word that should exist by the rules of word-formation but which no one actually uses.
- Collins Scrabble Words (2015) which fell off a truck somewhere; “c” in the program output.
For each solution, my program tells you which dictionaries it comes from. If it’s in all the dictionaries, the list shows up in green. (The word is in green if it’s a pangram.)
How closely do these agree with one another? And how closely do they agree with the Times? Read on, if you really want to know.
When I popped “mcehnta” (May 5) into the program, I found 133 “solutions.” Only 47 of them are in all three dictionaries.
You can find out the Times’ opinion of acceptable solutions the day after the puzzle appears (and that seems to be it—no archive of past puzzles yet). Our puzzle has 55 accepted solutions:
The yellow check marks are the ones I entered. I missed six (enema, manhattan, matcha, meet, teammate, and teem), mostly through failing to type in things that appeared my list of 133. Four of the six were in all three dictionaries. One (manhattan) was only in the Collins dictionary. And how is that not a proper noun, which should be excluded? One of them, matcha, is not in any of my dictionaries.
Of the 47 solutions that were in all three of my dictionaries, six were not accepted by the Times: cementa, enemata, heme, mach, manana, and matt. OK, I can live without cementa and enemata (though honestly, if you include thema shouldn’t you include themata?). And maybe you have to be a doctor to think of heme as a widely accepted word. Doctors use both “heme” and “hamate” (which the Times accepted), but they probably use heme a hundred times for every hamate.
Of the 55 accepted by the Times, 12 were not in all three dictionaries. OK, the four pangrams (attachment, catchment, enchantment, and enhancement) are all too long to be in the OSD3, which is limited to seven (or is it eight?) letters. (There is another even more official Scrabble dictionary, the OCTWL, that includes words of up to 15 letters, the width of the board.) By the way it seems to be quite unusual to have four pangrams—most puzzles seem to have just one, but they are all supposed to have at least one.
The remaining eight Times-accepted non-consensus choices: emanant*, hamate, manhattan*, meme, mentee*, meta, meth, thema*. The four starred words are only in the Collins dictionary. So it seems that every Times-accepted word is in Collins, which is not surprising, because Collins is just much bigger, as noted below. There are, however, lots of words in Collins that the Times does not accept. Clearly I need a Venn diagram! Of the remaining four words, meme is in 12dicts, flagged as a neologism, but not in OSD3. Hamate, meta, and meth are in OSD3, but not 12dicts.
If you are still reading: the dictionary thing is complicated!
As a result, I decided to add my own dictionaries: one for additions, like “matcha,” that the Times accepts but aren’t in my dictionaries, and an anti-dictionary for exclusions, like the 79 words mentioned above. This will help make future solutions more accurate, since I have seen that the same excluded words, like “amah,” come up over and over. Anyway, “mcehnta” now yields the 55 Times-approved solutions.
While I was at it, I had the program count the words; originally all of them, and then only the words with four or more letters. So now I can tell you how many words in each dictionary have three or fewer letters:
- OSPD3: 1,075 out of 80,612 (1.33%)
- 2of12inf: 711 out of 81,882 (0.87%)
- Collins: 1,465 out of 276.643 (0.53%)
Of course, a Scrabble dictionary will have a great emphasis on two- and three-letter words, so it is not surprising that OSPD3’s percentage is high; although Collins is a Scrabble dictionary, it is so much larger that the added heft swamps the less-then-four-letter ones.
Local team makes good…
In November 2015, our local paper, the Belmont Citizen-Herald, had a lead article, “Chenery Cheetahs win spelling bee,” talking about a team from the Chenery Middle School, our town middle school. I found it so amusing that anyone in the Boston area would name a team “the Cheetahs” that I scanned the article. But by a bizarre quirk of fate, it is less than four years later and I am a spelling bee “cheetah” in my own right.