Hate UML?

Draw sequence diagrams in seconds.
http://www.websequencediagrams.com

What does your phone number spell?
Posted on: 2007-04-07 18:35:22

Type it in here and see.
You might want to visit DialAbc.com, which has more results. Stay here if you are interested in the theory behind it.

This article was actually written in 2002. Here, I explain a technique for figuring out which words are in which phone numbers. Full C source code is included.

How did the computer do that?

Quick answer:

  • TrieStore.h
  • TrieStore.c
  • main.c
  • Compile using gcc -o spellophone *.c on unix/linux. A competent C programmaer can adapt it to run on Windows in about 5 minutes.

Long answer:

Computers are wonderful things. We take for granted that they can do stuff like the above in microseconds, but to non-computer scientists it seems like magic. If you've ever wondered what computer scientists do all day, this will help.

I developed the algorithm for spellophone over Christmas break 2001. It wasn't easy -- I went through three different drafts until I found the right one. Would you believe that the first one took over ten minutes to run on a ten-digit number on my old Pentium 166?

The first algorithm I tried is known as the "brute force" approach. I had the computer go through every possible combination of letters and check it for words. I thought I had a clever way of doing that because I used a trie to store all of the words in the dictionary. A trie is like a very smart parrot. You can shout letters at it, and it will squawk if anything you say forms a word. For example, if you say "D-O-G" at it, it will squawk because DOG is a word. If you then say "M-A" it will squawk again because DOGMA is also a word.

Despite the clever use of the trie, the program still ran too slowly. Can you guess why? Think about how many possible letters are in a telephone number.

If you look at a phone keypad, you'll see that each digit has three letters on it. Some of them have four on newer phones. So you can only make three one-letter words with one digit. But if you add a second digit, say "22", you can spell "AA, AB, AC, BA, BB, BC, CA, CB, CC", or 9 words. That's quite a bit more than one! It turns out that if you add 10 digits, there are 1049760 possible combinations of letters. And for each possible combination that spells actual words, there are lots of different ways those words can be placed together between dashes. So it turns out that the computer might have to go through millions of combinations, trying to pick out the ones that make sense. Unfortunately, even today's computers can't process all of those words fast enough. So I had to find a better way.

Dynamic Programming

I looked at what the computer was doing, and it seemed to me that it was wasting a lot of its time asking the trie about the same words over and over again. For example, if the last three digits in phone number didn't actually spell anything, it would still check them thousands of times in combination with the first part of the phone number.

Then something struck me. Last year, I had learned of something that was meant to deal with just this type of problem in one of my computer science courses. It's called "Dynamic Programming", or DP for short. Dynamic Programming is a way of programming that solves problems a little bit at a time. It's best for problems where each piece of the the problem is built on the previous one, so once you solve all of these little pieces you can put them together and solve the entire puzzle. If I could figure out a way to make DP work, then the computer would only need to check each word once, and the program might work in seconds instead of minutes!

I racked my brain, trying to remember what Professor Chan had said in his thick accent. With DP, you have to make a grid of squares, and each square represents a small part of the solution. After a lot of thinking, I drew a grid on paper. Across the top, I wrote Starting Position, and along the side, I wrote Length of word. Each square would contain all of the words that you could spell if you started on a certain digit and used a certain number of letters.

My hands trembled as I stepped through the algorithm on paper. I used the phone number "78225" because I knew it spelled "QUACK." (I used to work for a company called Quack.com and that was part of their number). Here's what I came up with. The partial words are in normal printing, and the finished words in each square are in bold:

Starting digit
Length 7 (PQRS)8 (TUV)2 (ABC)2 (ABC) 5 (JKL)
P, Q, R, S T, U, V A, B, C A, B, C J, K, L
PU, QU, RU, ST, SU TA, UB, VA AA, AB, AC, BA, CA AL, BL, CL .
PUB, PUC, QUA, RUB, STA, SUC TAB, TAC, VAC ACK, BAL, CAJ, CAK, CAL . .
PUBA, QUAC, RUBA, RUBB, STAB, STAC, SUCC TACK . . .
QUACK, STABL, STACK . . . .

What is good about this is the table could be calculated very quickly. Each square builds on the data that was already processed in the square above. I had done it in five minutes on paper -- the computer could do it in the blink of an eye. Also, it is pretty easy to string all of the words together so that the longest ones appear first.

Dynamic Programming always involves two steps -- first creating the solution table and then analyzing it to piece the solution together. The way I chose to piece the solution together is pretty simple:

  1. Begin at the leftmost column that you haven't worked on yet.
  2. Get a word from the bottom-most square.
  3. Now put that word in your phone number, and go to the digits that are now left-over at the end. Start at step 1.
  4. Once you run out of words, try the next one from the square you chose in step 2 before you went off to the left-over digits. If there are no more words left in the square, continue up-wards. Repeat step 3.
  5. Once you have exhausted the left-most column, get rid of it, add the number instead of a letter, and start over until there are no more columns left in the table.

It's harder to explain than to do. Using the above data, you get the following words in this order: QUACK, STACK, STAB-5, PUBA-5, PUB-25, 7-TACK, 78-AB-5

Other work

There are quite a few web sites that do this kind of thing. Phonespell.org has a good description of how their algorithm works, in the F.A.Q. section.

dialabc.com is the nicest web site that I have seen to search for words in phone numbers.

Want more programming tech talk?
Add to Circles on Google Plus
Subscribe to posts

Post comment

Real Name:
Your Email (Not displayed):

Text only. No HTML. If you write "http:" your message will be ignored.
Choose an edit password if you want to be able to edit or delete your comment later.
Editing Password (Optional):

su

2007-06-18 04:01:44
Hey! Thanks for the program. Yers ago, we had one hat someone from MIT wrote as a hack that did a similar thing to yours, and also went the other way - you told it things you wanted to spell with your phone nymber, and it told you what the number sequence was (easy enough to do in your head, but tedious) and then based on it's interpretation of what you were trying to express, it would offer several close alternatives, so you had higher hopes of finding a specific number that was actually available. Do you know of any such beast? (Ut was an Athena-unix op sys app asI remember), Thanks much!

robert

2007-08-03 18:38:06
This is really cool. good work.

Steve Hanov

2008-09-19 23:59:01
Want to know something weird?

In 2003 I used to have my phone number on my web page. "Only women are afraid to have their phone number on their web page!" I thought. How naive I was.

Then, out of the blue, I got a phone call frome someone in, I don't know, Alamaba? This guy woke me up at like, 10 in the morning (at this time I was a jobless bum and slept all day). He wanted to know how to adapt it to run on Windows.

I would have been happy to tell him, except that I only had a cell phone, and in Canada I was being charged for long distance incoming calls. So I just gave him some vague answers about makefiles and /usr/dict/words and quickly got this guy off the line.

So the motto is, don't post your phone number on your web page: otherwise, weird people call you and they might wake you up.

vijayakrishnan

2010-10-12 05:10:50
i have one doubt .how to check the particular digits of the phone number using java script.for example number starts only 9367xxxxxx otherwise it returns false statement .how did i set the condition plz reply

bs

2010-11-21 05:26:33
Nice work! Do you see any chance to have it scale better for longer query numbers? The folks at spellMyNumber.mnim.org somehow get around it. best regards.
Email
steve.hanov@gmail.com

Other posts by Steve

Yes, You Absolutely Might Possibly Need an EIN to Sell Software to the US How Asana Breaks the Rules About Per-Seat Pricing 5 Ways PowToon Made Me Want to Buy Their Software How I run my business selling software to Americans 0, 1, Many, a Zillion Give your Commodore 64 new life with an SD card reader 20 lines of code that will beat A/B testing every time [comic] Appreciation of xkcd comics vs. technical ability VP trees: A data structure for finding stuff fast Why you should go to the Business of Software Conference Next Year Four ways of handling asynchronous operations in node.js Type-checked CoffeeScript with jzbuild Zero load time file formats Finding the top K items in a list efficiently An instant rhyming dictionary for any web site Succinct Data Structures: Cramming 80,000 words into a Javascript file. Throw away the keys: Easy, Minimal Perfect Hashing Why don't web browsers do this? Fun with Colour Difference Compressing dictionaries with a DAWG Fast and Easy Levenshtein distance using a Trie The Curious Complexity of Being Turned On Cross-domain communication the HTML5 way Five essential steps to prepare for your next programming interview Minimal usable Ubuntu with one command Finding awesome developers in programming interviews Compress your JSON with automatic type extraction JZBUILD - An Easy Javascript Build System Pssst! Want to stream your videos to your iPod? "This is stupid. Your program doesn't work," my wife told me The simple and obvious way to walk through a graph Asking users for steps to reproduce bugs, and other dumb ideas Creating portable binaries on Linux Bending over: How to sell your software to large companies Regular Expression Matching can be Ugly and Slow C++: A language for next generation web apps qb.js: An implementation of QBASIC in Javascript Zwibbler: A simple drawing program using Javascript and Canvas You don't need a project/solution to use the VC++ debugger Boring Date (comic) barcamp (comic) How IE <canvas> tag emulation works I didn't know you could mix and match (comic) Sign here (comic) It's a dirty job... (comic) The PenIsland Problem: Text-to-speech for domain names Pitching to VCs #2 (comic) Building a better rhyming dictionary Does Android team with eccentric geeks? (comic) Comment spam defeated at last Pitching to VCs (comic) How QBASIC almost got me killed Blame the extensions (comic) How to run a linux based home web server Microsoft's generosity knows no end for a year (comic) Using the Acer Aspire One as a web server When programmers design web sites (comic) Finding great ideas for your startup Game Theory, Salary Negotiation, and Programmers Coding tips they don't teach you in school When a reporter mangles your elevator pitch Test Driven Development without Tears Drawing Graphs with Physics Free up disk space in Ubuntu Keeping Abreast of Pornographic Research in Computer Science Exploiting perceptual colour difference for edge detection Experiment: Deleting a post from the Internet Is 2009 the year of Linux malware? Email Etiquette How a programmer reads your resume (comic) How wide should you make your web page? Usability Nightmare: Xfce Settings Manager cairo blur image surface Automatically remove wordiness from your writing Why Perforce is more scalable than Git Optimizing Ubuntu to run from a USB key or SD card UMA Questions Answered Make Windows XP look like Ubuntu, with Spinning Cube Effect See sound without drugs Standby Preventer Stock Picking using Python Spoke.com scam Stackoverflow.com Copy a cairo surface to the windows clipboard Simulating freehand drawing with Cairo Free, Raw Stock Data Installing Ubuntu on the Via Artigo Why are all my lines fuzzy in cairo? A simple command line calculator Tool for Creating UML Sequence Diagrams Exploring sound with Wavelets UMA and free long distance UMA's dirty secrets Installing the Latest Debian on an Ancient Laptop Dissecting Adsense HTML/ Javascript/ CSS Pretty Printer Web Comic Aggregator Experiments in making money online How much cash do celebrities make? Draw waveforms and hear them Cell Phones on Airplanes Detecting C++ memory leaks What does your phone number spell? A Rhyming Engine Rules for Effective C++ Cell Phone Secrets