< >

What does your phone number spell?

Posted 18 years ago

You might want to visit DialAbc.com, which has more results. Stay here if you are interested in the theory behind it.

This article was actually written in 2002. Here, I explain a technique for figuring out which words are in which phone numbers. Full C source code is included.

How did the computer do that?

Quick answer:

TrieStore.h
TrieStore.c
main.c
Compile using gcc -o spellophone *.c on unix/linux. A competent C programmaer can adapt it to run on Windows in about 5 minutes.

Long answer:

Computers are wonderful things. We take for granted that they can do stuff like the above in microseconds, but to non-computer scientists it seems like magic. If you've ever wondered what computer scientists do all day, this will help.

I developed the algorithm for spellophone over Christmas break 2001. It wasn't easy -- I went through three different drafts until I found the right one. Would you believe that the first one took over ten minutes to run on a ten-digit number on my old Pentium 166?

The first algorithm I tried is known as the "brute force" approach. I had the computer go through every possible combination of letters and check it for words. I thought I had a clever way of doing that because I used a trie to store all of the words in the dictionary. A trie is like a very smart parrot. You can shout letters at it, and it will squawk if anything you say forms a word. For example, if you say "D-O-G" at it, it will squawk because DOG is a word. If you then say "M-A" it will squawk again because DOGMA is also a word.

Despite the clever use of the trie, the program still ran too slowly. Can you guess why? Think about how many possible letters are in a telephone number.

If you look at a phone keypad, you'll see that each digit has three letters on it. Some of them have four on newer phones. So you can only make three one-letter words with one digit. But if you add a second digit, say "22", you can spell "AA, AB, AC, BA, BB, BC, CA, CB, CC", or 9 words. That's quite a bit more than one! It turns out that if you add 10 digits, there are 1049760 possible combinations of letters. And for each possible combination that spells actual words, there are lots of different ways those words can be placed together between dashes. So it turns out that the computer might have to go through millions of combinations, trying to pick out the ones that make sense. Unfortunately, even today's computers can't process all of those words fast enough. So I had to find a better way.

Dynamic Programming

I looked at what the computer was doing, and it seemed to me that it was wasting a lot of its time asking the trie about the same words over and over again. For example, if the last three digits in phone number didn't actually spell anything, it would still check them thousands of times in combination with the first part of the phone number.

Then something struck me. Last year, I had learned of something that was meant to deal with just this type of problem in one of my computer science courses. It's called "Dynamic Programming", or DP for short. Dynamic Programming is a way of programming that solves problems a little bit at a time. It's best for problems where each piece of the the problem is built on the previous one, so once you solve all of these little pieces you can put them together and solve the entire puzzle. If I could figure out a way to make DP work, then the computer would only need to check each word once, and the program might work in seconds instead of minutes!

I racked my brain, trying to remember what Professor Chan had said in his thick accent. With DP, you have to make a grid of squares, and each square represents a small part of the solution. After a lot of thinking, I drew a grid on paper. Across the top, I wrote Starting Position, and along the side, I wrote Length of word. Each square would contain all of the words that you could spell if you started on a certain digit and used a certain number of letters.

My hands trembled as I stepped through the algorithm on paper. I used the phone number "78225" because I knew it spelled "QUACK." (I used to work for a company called Quack.com and that was part of their number). Here's what I came up with. The partial words are in normal printing, and the finished words in each square are in bold:

Starting digit
Length	7 (PQRS)	8 (TUV)	2 (ABC)	2 (ABC)	5 (JKL)
	P, Q, R, S	T, U, V	A, B, C	A, B, C	J, K, L
	PU, QU, RU, ST, SU	TA, UB, VA	AA, AB, AC, BA, CA	AL, BL, CL	.
	PUB, PUC, QUA, RUB, STA, SUC	TAB, TAC, VAC	ACK, BAL, CAJ, CAK, CAL	.	.
	PUBA, QUAC, RUBA, RUBB, STAB, STAC, SUCC	TACK	.	.	.
	QUACK, STABL, STACK	.	.	.	.

What is good about this is the table could be calculated very quickly. Each square builds on the data that was already processed in the square above. I had done it in five minutes on paper -- the computer could do it in the blink of an eye. Also, it is pretty easy to string all of the words together so that the longest ones appear first.

Dynamic Programming always involves two steps -- first creating the solution table and then analyzing it to piece the solution together. The way I chose to piece the solution together is pretty simple:

Begin at the leftmost column that you haven't worked on yet.
Get a word from the bottom-most square.
Now put that word in your phone number, and go to the digits that are now left-over at the end. Start at step 1.
Once you run out of words, try the next one from the square you chose in step 2 before you went off to the left-over digits. If there are no more words left in the square, continue up-wards. Repeat step 3.
Once you have exhausted the left-most column, get rid of it, add the number instead of a letter, and start over until there are no more columns left in the table.

It's harder to explain than to do. Using the above data, you get the following words in this order: QUACK, STACK, STAB-5, PUBA-5, PUB-25, 7-TACK, 78-AB-5

Other work

There are quite a few web sites that do this kind of thing. Phonespell.org has a good description of how their algorithm works, in the F.A.Q. section.

dialabc.com is the nicest web site that I have seen to search for words in phone numbers.

Steve Hanov makes a living working on Rhymebrain.com, rapt.ink, www.websequencediagrams.com, and Zwibbler.com. He lives in Waterloo, Canada.

Post comment

edit

14 years ago

Nice work! Do you see any chance to have it scale better for longer query numbers? The folks at spellMyNumber.mnim.org somehow get around it. best regards.

edit

vijayakrishnan

14 years ago

i have one doubt .how to check the particular digits of the phone number using java script.for example number starts only 9367xxxxxx otherwise it returns false statement .how did i set the condition plz reply

edit

Steve Hanov

16 years ago

Want to know something weird?

In 2003 I used to have my phone number on my web page. "Only women are afraid to have their phone number on their web page!" I thought. How naive I was.

Then, out of the blue, I got a phone call frome someone in, I don't know, Alamaba? This guy woke me up at like, 10 in the morning (at this time I was a jobless bum and slept all day). He wanted to know how to adapt it to run on Windows.

I would have been happy to tell him, except that I only had a cell phone, and in Canada I was being charged for long distance incoming calls. So I just gave him some vague answers about makefiles and /usr/dict/words and quickly got this guy off the line.

So the motto is, don't post your phone number on your web page: otherwise, weird people call you and they might wake you up.

edit

robert

17 years ago

This is really cool. good work.

edit

18 years ago

Hey! Thanks for the program. Yers ago, we had one hat someone from MIT wrote as a hack that did a similar thing to yours, and also went the other way - you told it things you wanted to spell with your phone nymber, and it told you what the number sequence was (easy enough to do in your head, but tedious) and then based on it's interpretation of what you were trying to express, it would offer several close alternatives, so you had higher hopes of finding a specific number that was actually available. Do you know of any such beast? (Ut was an Athena-unix op sys app asI remember), Thanks much!

Give your Commodore 64 new life with an SD card reader

Dust off your old Commodore 64, and you could be the coolest kid on the block by plugging SD cards into it instead of floppies.

Bending over: How to sell your software to large companies

For a micro-ISV, selling to businesses can be more lucrative than selling to consumers. Instead of making a few dollars per sale and hoping for thousands of sales, you sell to only a few customers, and charge much higher rates. But the rates are high for a reason. It takes more time and money to sell to businesses.

Fast and Easy Levenshtein distance using a Trie

If you have a web site with a search function, you will rapidly realize that most mortals are terrible typists. Many searches contain mispelled words, and users will expect these searches to magically work. This magic is often done using levenshtein distance. In this article, I'll compare two ways of finding the closest matching word in a large dictionary. I'll describe how I use it on rhymebrain.com

Spoke.com scam

Rant: Why do companies think they can make money by posting false information about you on the Internet?

Building a better rhyming dictionary

Back in 2007, I created a rhyming engine based on the public domain Moby pronouncing dictionary. It simply reads the dictionary and looks for rhyming words by comparing the suffix of the words' pronunciations. Since that time, I have made some improvements.

How I run my business selling software to Americans

Here's what you can do to get the most out of your business in Canada if all of your revenue comes in US dollars.

20 lines of code that will beat A/B testing every time

A/B testing is used far too often, for something that performs so badly. It is defective by design: Segment users into two groups. Show the A group the old, tried and true stuff. Show the B group the new whiz-bang design with the bigger buttons and slightly different copy. After a while, take a look at the stats and figure out which group presses the button more often. Sounds good, right? The problem is staring you in the face. It is the same dilemma faced by researchers administering drug studies. During drug trials, you can only give half the patients the life saving treatment. The others get sugar water. If the treatment works, group B lost out. This sacrifice is made to get good data. But it doesn't have to be this way.

Experiments in making money online

Is it possible to make money on the internet, if you try really hard? I want to find out. I have always been interested in getting money for doing nothing.

Cell Phone Secrets

How to choose a cell phone in 2006, if you want the best possible radio.

The Curious Complexity of Being Turned On

In software, the simplest things can turn into a nightmare, especially at a large company.