What does your phone number spell?
You might want to visit DialAbc.com, which has more results. Stay here if you are interested in the theory behind it.
This article was actually written in 2002. Here, I explain a technique for figuring out which words are in which phone numbers. Full C source code is included.
How did the computer do that?
- Compile using gcc -o spellophone *.c on unix/linux. A competent C programmaer can adapt it to run on Windows in about 5 minutes.
Computers are wonderful things. We take for granted that they can do stuff like the above in microseconds, but to non-computer scientists it seems like magic. If you've ever wondered what computer scientists do all day, this will help.
I developed the algorithm for spellophone over Christmas break 2001. It wasn't easy -- I went through three different drafts until I found the right one. Would you believe that the first one took over ten minutes to run on a ten-digit number on my old Pentium 166?
The first algorithm I tried is known as the "brute force" approach. I had the computer go through every possible combination of letters and check it for words. I thought I had a clever way of doing that because I used a trie to store all of the words in the dictionary. A trie is like a very smart parrot. You can shout letters at it, and it will squawk if anything you say forms a word. For example, if you say "D-O-G" at it, it will squawk because DOG is a word. If you then say "M-A" it will squawk again because DOGMA is also a word.
Despite the clever use of the trie, the program still ran too slowly. Can you guess why? Think about how many possible letters are in a telephone number.
If you look at a phone keypad, you'll see that each digit has three letters on it. Some of them have four on newer phones. So you can only make three one-letter words with one digit. But if you add a second digit, say "22", you can spell "AA, AB, AC, BA, BB, BC, CA, CB, CC", or 9 words. That's quite a bit more than one! It turns out that if you add 10 digits, there are 1049760 possible combinations of letters. And for each possible combination that spells actual words, there are lots of different ways those words can be placed together between dashes. So it turns out that the computer might have to go through millions of combinations, trying to pick out the ones that make sense. Unfortunately, even today's computers can't process all of those words fast enough. So I had to find a better way.
Dynamic ProgrammingI looked at what the computer was doing, and it seemed to me that it was wasting a lot of its time asking the trie about the same words over and over again. For example, if the last three digits in phone number didn't actually spell anything, it would still check them thousands of times in combination with the first part of the phone number.
Then something struck me. Last year, I had learned of something that was meant to deal with just this type of problem in one of my computer science courses. It's called "Dynamic Programming", or DP for short. Dynamic Programming is a way of programming that solves problems a little bit at a time. It's best for problems where each piece of the the problem is built on the previous one, so once you solve all of these little pieces you can put them together and solve the entire puzzle. If I could figure out a way to make DP work, then the computer would only need to check each word once, and the program might work in seconds instead of minutes!
I racked my brain, trying to remember what Professor Chan had said in his thick accent. With DP, you have to make a grid of squares, and each square represents a small part of the solution. After a lot of thinking, I drew a grid on paper. Across the top, I wrote Starting Position, and along the side, I wrote Length of word. Each square would contain all of the words that you could spell if you started on a certain digit and used a certain number of letters.
My hands trembled as I stepped through the algorithm on paper. I used the phone number "78225" because I knew it spelled "QUACK." (I used to work for a company called Quack.com and that was part of their number). Here's what I came up with. The partial words are in normal printing, and the finished words in each square are in bold:
|Length||7 (PQRS)||8 (TUV)||2 (ABC)||2 (ABC)||5 (JKL)|
|P, Q, R, S||T, U, V||A, B, C||A, B, C||J, K, L|
|PU, QU, RU, ST, SU||TA, UB, VA||AA, AB, AC, BA, CA||AL, BL, CL||.|
|PUB, PUC, QUA, RUB, STA, SUC||TAB, TAC, VAC||ACK, BAL, CAJ, CAK, CAL||.||.|
|PUBA, QUAC, RUBA, RUBB, STAB, STAC, SUCC||TACK||.||.||.|
|QUACK, STABL, STACK||.||.||.||.|
What is good about this is the table could be calculated very quickly. Each square builds on the data that was already processed in the square above. I had done it in five minutes on paper -- the computer could do it in the blink of an eye. Also, it is pretty easy to string all of the words together so that the longest ones appear first.
Dynamic Programming always involves two steps -- first creating the solution table and then analyzing it to piece the solution together. The way I chose to piece the solution together is pretty simple:
- Begin at the leftmost column that you haven't worked on yet.
- Get a word from the bottom-most square.
- Now put that word in your phone number, and go to the digits that are now left-over at the end. Start at step 1.
- Once you run out of words, try the next one from the square you chose in step 2 before you went off to the left-over digits. If there are no more words left in the square, continue up-wards. Repeat step 3.
- Once you have exhausted the left-most column, get rid of it, add the number instead of a letter, and start over until there are no more columns left in the table.
It's harder to explain than to do. Using the above data, you get the following words in this order: QUACK, STACK, STAB-5, PUBA-5, PUB-25, 7-TACK, 78-AB-5
Other workThere are quite a few web sites that do this kind of thing. Phonespell.org has a good description of how their algorithm works, in the F.A.Q. section.
dialabc.com is the nicest web site that I have seen to search for words in phone numbers.
Comment spam defeated at lastFor years when running this blog, I would have to log in each day and delete a dozen comments due to spam. This was a chore, and I tried many ways to stem the tide.
Drawing Graphs with PhysicsTo my surprise, I found that there is a very simple way to arrange graphs that can be expressed in only a few lines of code, using force-directed placement...
Using the Acer Aspire One as a web serverA netbook can be ideal for a home web server. They are cheap, and use less power than a CFL light bulb.
0, 1, Many, a ZillionIt's common wisdom that there should only be three numbers in source code. But there's actually four. Here's why.
Building a better rhyming dictionaryBack in 2007, I created a rhyming engine based on the public domain Moby pronouncing dictionary. It simply reads the dictionary and looks for rhyming words by comparing the suffix of the words' pronunciations. Since that time, I have made some improvements.
Why you should go to the Business of Software Conference Next YearMost people, having already paid $2000.00 of their hard earned money, and then having flown, driven, or otherwise travelled to Boston to attend a conference, and then having paid an additional $250/night plus $33/night parking and "tourism taxes" to the Seaport Hotel -- most people, after all this, are unlikely to say that it was a waste of time and they should have stayed home watching the remaining salvaged episodes of Doctor Who on Netflix.
In fact, I found it quite useful.