The PenIsland Problem: Text-to-speech for domain names
"expertsexchange.com" is a domain name that can be read in multiple, unintended ways. Howshouldatexttospeechsystemresolvethisambiguity?
Recently, I was contracted to run a list of domain names through the custom-built pronunciation engine that powers my rhyming web site. On the first attempt, I found that the results were embarrassingly bad. A quick inspection revealed the problem: most domain names are severalwordsstucktogether.
When a pronunciation by analogy system encounters an unknown word, it searches its knowledge base for words that look similar, and tries to stitch together their pronunciations. In this case, it was doing just what it was supposed to do. For example, lots of words end with an 'e', and usually that 'e' is silent when at the end of a word. But stick another word on, and the system would try to pronounce the 'e', just like a six-year-old learning to read by sounding out each letter. Most people, on the other hand, would recognize the two words and say them each individually.
Try these domains in the AT&T text to speech system, which many consider to be the best in the world, at http://www.research.att.com/~ttsweb/tts/demo.php.
- thepiratebay.com (sounds like separately?)
- mydreamcloset.com (huh?)
- torrentspy.com (sounds like a polish name)
- 123greetings.com (AT&T is ridiculous with this one)
Time for a bit of dynamic programming. After finding an appropriate scoring function, we can break up text the same way a human reader would. We also use some simple heuristics to say numbers properly.
Although I don't have a speech synthesizer, you can check the raw pronunciation output using this form. The phonemes correspond to the ones in the CMU pronouncing dictionary.
You can cheat so your web site seems faster than it isYou can make your web site seem faster without actually being faster.
Why you should go to the Business of Software Conference Next YearMost people, having already paid $2000.00 of their hard earned money, and then having flown, driven, or otherwise travelled to Boston to attend a conference, and then having paid an additional $250/night plus $33/night parking and "tourism taxes" to the Seaport Hotel -- most people, after all this, are unlikely to say that it was a waste of time and they should have stayed home watching the remaining salvaged episodes of Doctor Who on Netflix.
In fact, I found it quite useful.