Building a better rhyming dictionary
Back in 2007, I created a rhyming engine based on the public domain Moby pronouncing dictionary. It simply reads the dictionary and looks for rhyming words by comparing the suffix of the words' pronunciations. Since that time, I have made some improvements.
Using a comnbiation of techniques from artificial intelligence, math, and linguistics, the rhyming engine can now figure out how to say any word that you enter. That means if you enter a word that is not in the dictionary, it will still be able to find some rhymes.
Rather than looking for technically perfect rhymes, it suggests words that would sound good together in song or poetry. For example, we sometimes ignore consonants, as suggested by this 1985 paper. That way, fervently will rhyme with urgently despite the v/g mismatch.
There is a legal advantage to this technique as well. Many of the standard word lists used by natural language processing researchers include words from an old edition of the Oxford dictionary, and so cannot be used for "commercial purposes". That's why both Rhymezone and Write Express have a relatively limited dictionary size. My rhyming engine can sidestep this issue, since it only needs to be seeded with a small number of words from unrestricted sources, and it can then import words in bulk, and guess the pronunciations without using any restricted content.
I couldn't resist doing some premature optimization. It uses one of my favourite data structures -- the trie. The program starts, reads the entire 260,000 word database, and completes in 60 ms on my netbook web server. It takes about 8 MB of memory. I guess that equates to about 0.48 mega-byteseconds per request.
Why is this hard?Text to speech for English is still a hard problem to solve, and it is an active area of research. Consider the words rough, through, bough, thought, dough, cough, or photOgraph, photOgraphy, or physics, lymphatic, and loophole. In the 80's, and still today in many cases, text to speech is done by hiring specially trained linguists to develop the thousands of rules necessary to create pronunciations. It is only in the last 10 years or so that this task has been automated. My system has over 200,000 hints on how to interpret each part of a word given its context. With further refinements, this could probably be reduced to tens of thousands, which is still a lot.
- Automatically remove wordiness from your writing
- What does your phone number spell?
- Keeping abreast of pornographic research in computer science
- Exploring sound with wavelets
You can cheat so your web site seems faster than it isYou can make your web site seem faster without actually being faster.
Finding awesome developers in programming interviewsIn a job interview, I once asked a very experienced embedded software developer to write a program that reverses a string and prints it on the screen. He struggled with this basic task. This man was awesome. Give him a bucket of spare parts, and he could build a robot and program it to navigate around the room. He had worked on satellites that are now in actual orbit. He could have coded circles around me. But the one thing that he had never, ever needed to do was: display something on the screen.
Bending over: How to sell your software to large companiesFor a micro-ISV, selling to businesses can be more lucrative than selling to consumers. Instead of making a few dollars per sale and hoping for thousands of sales, you sell to only a few customers, and charge much higher rates. But the rates are high for a reason. It takes more time and money to sell to businesses.
An instant rhyming dictionary for any web siteSometimes your API has to be simple enough for non-technical people to use it. Find out how to include a rhyming dictionary on your web page just by copying and pasting.
Minimal usable Ubuntu with one commandIf you install the default "ubuntu-desktop" you also get with it a gigabyte of crap that you will never use. But if you don't install the ubuntu desktop, you get a system with a text-only login: prompt, and it's not clear what to install to get it to a usable state.
I have an irrational need to optimize my Ubuntu installation. I did some investigating and came up with this method, which gives a minimal graphical 1.2 GB install, with gnome, networking, and no applications.