The PenIsland Problem: Text-to-speech for domain names
"expertsexchange.com" is a domain name that can be read in multiple, unintended ways. Howshouldatexttospeechsystemresolvethisambiguity?
Recently, I was contracted to run a list of domain names through the custom-built pronunciation engine that powers my rhyming web site. On the first attempt, I found that the results were embarrassingly bad. A quick inspection revealed the problem: most domain names are severalwordsstucktogether.
When a pronunciation by analogy system encounters an unknown word, it searches its knowledge base for words that look similar, and tries to stitch together their pronunciations. In this case, it was doing just what it was supposed to do. For example, lots of words end with an 'e', and usually that 'e' is silent when at the end of a word. But stick another word on, and the system would try to pronounce the 'e', just like a six-year-old learning to read by sounding out each letter. Most people, on the other hand, would recognize the two words and say them each individually.
Try these domains in the AT&T text to speech system, which many consider to be the best in the world, at http://www.research.att.com/~ttsweb/tts/demo.php.
- thepiratebay.com (sounds like separately?)
- mydreamcloset.com (huh?)
- torrentspy.com (sounds like a polish name)
- 123greetings.com (AT&T is ridiculous with this one)
Time for a bit of dynamic programming. After finding an appropriate scoring function, we can break up text the same way a human reader would. We also use some simple heuristics to say numbers properly.
Although I don't have a speech synthesizer, you can check the raw pronunciation output using this form. The phonemes correspond to the ones in the CMU pronouncing dictionary.
It german speech and means showerlight or youbitch ;-)
I have no experience with dynamic programming, and unlike your phonenum-spelling post, it's hard for me to understand how exactly the problem was broken down. I assume you did this with your 'scoring function'. Would you mind quickly jotting down some pseudocode for how exactly this function works?
Cheers!