A Rhyming Engine

Here's a rhyming engine, written in 1000 lines of C++ code. It uses the freely available Moby dictionary, and full source code is provided. Give it a try. Read on for technical information.

Please try the updated rhyming web site.



Results:

Relationship to Rhymebrain.com

I originally wrote this post in 2007. For the past five years I have been researching linguistics and machine learning techniques and created rhymebrain.com. Rhymebrain, at its core, still uses the same techniques, but it does it much faster and supports any human language. Rhymebrain is propreitary. However, the source code presented with this article is released to the public domain.

How it works

Programs involving the english language are quite interesting, because there really are no rules. English is very complicated. A great resource is the Moby project, which is a public domain dictionary. It includes a text file providing word lists, including pronunciation and parts of speech.

Although I demonstrate only the rhyming part, my WordDatabase object uses both the part of speech and pronunciation information. This is for future expansion. Download source code here

Overview

  • As part of compilation, a perl script combines the Parts Of Speech and Pronunciation dictionary into a single file, dict.txt.
  • The rhyming engine reads dict.txt and, for every word, creates a Word object.
  • The Word object has the part of speech. It also contains a representation of the pronunciation of the word.
  • To figure out if two words rhyme, we compare the last part of their pronunciation. The more syllables that rhyme, the better the rhyme is.
  • The output is sorted so that better rhymes are listed first.

Similar projects

I'm not aware of many similar projects. Somebody named "tuffy" made a rhyming dictionary on sourceforge: rhyme.sourceforge.net. However, for the life of me, I can't figure out why it needs to use an external database package. My technique does not need to pre-compute the rhymes, and it is half as many lines of code.

Personally, I use the rhymezone.com rhyming dictionary for my rhyming needs.

The Rhyming API

class WordDatabase
{
public:
    WordDatabase();
    ~WordDatabase();

    bool load(const char* filename);
    bool findRhymes( DynamicArray& results, const char* word, WordFilter* filter,
            WordArray* wordList = 0);
    Word* lookup( const char* whichword );
    void filter( WordArray& results, WordFilter* filter );
    bool makeWords( WordArray& results, const char* text );
    bool loadThesaurus( const char* thesaurus );
    void addSynonym( Word* base, Word* synonym );
    unsigned getSynonyms( WordArray& results, const char* wordtext );

private:	
    StringMap _wordMap;
    DynamicArray _wordArray;
};

In this introduction, I use the word phoneme to mean a part of a word, as pronounced. Using a sequence of phonemes, and whether each phoneme is emphasized, you can completely describe how to pronounce a word, and hence derive its rhymes and syllables.

Preprocessing

The Moby project has separate files for parts of speech and pronunciation. So I wrote a perl script to combine the two files. The resulting word list only includes words that have both part of speech and pronunciation information.

From the readme file, I believe that the Moby pronunciation was derived by a British person. Fortunately, for the purposes of rhyming, it doesn't really matter. "Potato" will rhyme with "Tomato" no matter if you say "Po-tay-to" or "po-tah-toh".

A standard set of phonemes

For this project, I also leave it open to include the CMU dictionary, another free online dictinary. The problem is that these two dictionaries use a different set of phonemes. I had to figure out a mapping so that I could possible combine both dictionaries later on. The mapping is below. The fields are:
  • An enumeration for the phoneme
  • How the phoneme is displayed. This is for debugging purposes only, and is different from what is read in from the dictionary file.
  • Whether it represents a syllable.
PhoneSet_t PhoneSet[] = {
    { a_in_dab,       "ae",  1 },
    { a_in_air,       "ey",  1 },
    { a_in_far,       "ao",  1 },
    { a_in_day,       "ay",  1 },
    { a_in_ado,       "ah",  1 },
    { ir_in_tire,     "ire", 1 },
    { b_in_nab,       "b",   0 },
    { ch_in_ouch,     "ch",  0 },
    { d_in_pod,       "d",   0 },
    { e_in_red,       "e",   1 },
    { e_in_see,       "ee",  1 },
    { f_in_elf,       "f",   0 },
    { g_in_fig,       "g",   0 },
    { h_in_had,       "h",   0 },
    { w_in_white,     "wh",  0 },
    { i_in_hid,       "i",   1 },
    { i_in_ice,       "eye", 1 },
    { g_in_vegetably, "g",   0 },
    { c_in_act,       "k",   0 },
    { l_in_ail,       "l",   0 },
    { m_in_aim,       "m",   0 },
    { ng_in_bang,     "ng",  0 },
    { n_in_and,       "n",   0 },
    { oi_in_oil,      "oy",  1 },
    { o_in_bob,       "aa",  1 },
    { ow_in_how,      "ow",  1 },
    { o_in_dog,       "ah",  1 },
    { o_in_boat,      "oh",  1 },
    { oo_in_too,      "oo",  1 },
    { oo_in_book,     "ooh", 1 },
    { p_in_imp,       "p",   0 },
    { r_in_ire,       "er",  0 },
    { sh_in_she,      "sh",  0 },
    { s_in_sip,       "s",   0 },
    { th_in_bath,     "dth", 0 },
    { th_in_the,      "th",  0 },
    { t_in_tap,       "t",   0 },
    { u_in_cup,       "uh",  1 },
    { u_in_burn,      "u",   1 },
    { v_in_average,   "v",   0 },
    { w_in_win,       "w",   0 },
    { y_in_you,       "y",   0 },
    { s_in_vision,    "zh",  0 },
    { z_in_zoo,       "z",   0 },
    { a_in_ami,       "a",   1 },
    { n_in_francoise, "n",   0 },
    { r_in_der,       "r",   0 },
    { ch_in_bach,     "chh", 0 },
    { eu_in_bleu,     "eu",  1 },
    { u_in_duboise,   "u",   1 },
    { wa_in_noire,    "WA",  1 }
};

The combined dictionary

Here's the first few lines of the combined dictionary, which includes 110000 words. Each line has three parts. The first is the word, the second is the part of speech (N is noun, etc) and the third is the pronunciation. Right now, I use only the moby dictionary, so the pronunciation keys are weird characters as described in the moby readme file. Each phoneme is separated by a slash character.
A N /eI/
AWOL A '/eI/w/A/l
Aachen N '/A/k/@/n
Aalborg N '/O/lb/O/rg
Aalesund N '/O/l/@/,s/U/n
Aalst N /A/lst
Aalto N '/A/lt/O/
Aar N /A/r

Syllables

It took a bit of thinking to figure out how to count the syllables in the word. Believe it or not, it can be derived from the phoneme alone. The first approach I tried is to split the word into syllables based on consonents. This didn't work. In fact, you get the best results if you split based on vowels. (Linguists reading this are now shouting at the monitor, "Of course, you fool!") In the PhoneSet table above, I mark a phoneme with a 1 if it represents a new syllable. The number of syllables in a words can be derived by adding up the value of the phoneme.

Representing words

On startup, the program takes a second to read the dictionary file. It parses the phoneme (separated by slash characters) into a word structure, where each phoneme is represented by a 16 bit number. If the phoneme begins a new syllable, the upper 2 bits are:
  • 0 if the phoneme doesn't begin a new syllable
  • 1 if the phoneme is a secondary stress
  • 2 if the phoneme is of primary stress

Rhyming

Rhyming is not very complex once the words are in the phoneme format. To determine whether two words rhyme, you simple compare the suffixes of their pronunciation. Stresses are included, as well. Amateur song writers often try to rhyme things like "hello" and "yellow" and they come up with horrible lyrics, since the words are stressed differently. Since the stresses are stored with the phonemes, a simple string comparison will take care of this automatically, and these words will not be found to rhyme.

Other features

Although I didn't include it here, the word database can optionally load in the moby thesaurus, "mobythes.aur". Thus you can figure out if any word has the same meaning as any other.

You can use a "WordFilter" object to filter the results to have a certain number of syllables, or match a set of stresses. For example, you can request a noun phrase of 5 syllables to match the stresses "0101010101", which is iambic pentameter.

Future work

Lots of stuff can be done. I have noticed in hip hop and rap music, the rhyming rules are relaxed greatly. For example, eminem might consider time and spine to be good rhymes. By redefining more phonemes to be identical, you can emulate this type of rhyming.

Comments