
I know how to make and sell software online, and I can share my tips
with you.
Email
|
Twitter
|
LinkedIn
|
Comics
|
All articles
Exploring sound with Wavelets
Posted 17 years ago
Here's a program to create scalograms of sound files. Pictured below is the "windows xp startup sound". See how the individual frequencies have been isolated visually.
I have created a separate web page for this project... please go there.
Download Installer
I've been curious about wavelets since I did a course project on them.
The wavelet transform is similar to Fourier analysis, in that it figures out which frequencies exist in a given signal. The difference is that it adds another dimension to the data. From a 1-D waveform, you will get a 2-D picture. Each row is a frequency, and the columns are times. So you get a picture of how the frequency changes with time.
The DWT does speedup the wavelet transform greatly, and mathematically, no information is lost. However it is not a very good way to look at the data visually. From 1024 samples, you only get 10 frequency bands. There's no way to, for instance, distinguish individual notes in song. Here's an example of what you'd get from the DWT. Compare it to the result from the first image, and you see how much information is hidden!
Figure 1: Ten frequency bands from 512 samples. Where did all the information go???
Because of the DWT, very few people give the CWT (continuous wavelet transform) a second glance. The library is filled with books on wavelets that spend two pages on the CWT, and then talk for the rest of the book about applying the DWT. As a result, people think the DWT is all there is.
Another technique, called the the wavelet packet transform, gives you a little more detail. But at the end of it, if you have 1024 sound samples, you will have 1024 transformed points. The more times you perform the algorithm, the more detail you loose in time (and the image looks like a pixellated mess).
Continuous Wavelet Transform
My program applies the continuous wavelet transform to a wave file that you load in, and lets you zoom into see the individual frequencies that make up a sound. Give it a try!
One problem with it is that it generates a lot of data. Analyzing that sound took 170 MB of memory, and a couple of minutes on my computer. If you tried it on a 5 minute MP3 file, that's 5 times 60 seconds times 44100 samples per second * 44100/60 frequency bands = 9.7 billion data points, or about 38 GB of floating point data, if you don't use stereo!.
But it does produce some pretty pictures for short files. Here's a closeup view of the famous tada.wav:
Closeup on a small section of tada.wav
The majestic noise of the "c:windowsmediarecycle.wav" paper crumpling makes a great wallpaper.
How it works
- The program loads in a wave file using libsnd. If it is stereo or multichannel, the other channels are ignored and only the first channel is used.
-
When you see "Rendering... 1%" on the screen, the program is busy calculating the wavelet transform. It first calculates some frequency scales, from 2 samples to sampleRate divided by 60 samples long, and goes through them logarithmically (eg. 2, 4, 8, 16 samples long).
- For each scaling factor, it creates a "real" and "complex" wavelet whose period is that many samples long. The wavelet we use is the cosine function multiplied by a gaussian (For the real part) and the imaginary part is the same thing, but with a sine function. This is known as the Morlet wavelet, and it is exceptionally good for sound analysis due to the sine and cosine basis.
- Once it has created the wavelets, it convolves the wavelet with the signal. Convolution is kind of like smearing one signal with another. To speed up the algorithm, I perform convolution by multiplying the fourier transforms of the signal and the wavelet. After the convolution, we end up with the strength of the wavelet in the signal at each point in time.
- The process is repeated for each scale level.
- Now we have real and complex data samples. The magnitude of the data samples are converted into a huge device independent bitmap in memory, so it can be displayed to the screen. I hope you have lots of RAM.
Future Work
If I have time in the new year, I'm going to add some fun stuff:
- Drag and drop pitch shifting - This is not as easy as moving pixels on the image... first you have to do something called "phase unwrapping". My first cut at a phase unwrapping algorithm didn't work, so I'm trying to translate some fortran code from a 1981 paper I found. Does anybody have some C code for this???
- Boost/Reduce -- Draw a square with the mouse and boost or reduce the strength of that region. This could be great for manual noise elimination and sound retouching, or restoring that old copy of Brahms playing the piano.
How wide should you make your web page?

Based on 22500 unique IP addresses over the past week.
Cell Phone Secrets
How to choose a cell phone in 2006, if you want the best possible radio.
The PenIsland Problem: Text-to-speech for domain names
Recently, I was contracted to run a list of domain names through the custom-built pronunciation engine that powers my rhyming web site. On the first attempt, I found that the results were embarrassingly bad. A quick inspection revealed the problem: most domain names are severalwordsstucktogether.
Stock Picking using Python
Python can tell you which stocks to buy. It's a sure thing!
Finding awesome developers in programming interviews
In a job interview, I once asked a very experienced embedded software developer to write a program that reverses a string and prints it on the screen. He struggled with this basic task. This man was awesome. Give him a bucket of spare parts, and he could build a robot and program it to navigate around the room. He had worked on satellites that are now in actual orbit. He could have coded circles around me. But the one thing that he had never, ever needed to do was: display something on the screen.
Copy a cairo surface to the windows clipboard
I just spent several hours debugging clipboard copy of a DIB image. I could copy from my application, and paste into Paint. I could paste into Word. But if I pasted into WordPad, nothing showed up. If I pasted into GIMP, it crashed.
See sound without drugs

I have created an application that just turns on the microphone and continually plots the FFT magnitude of what it records. It allows control over the window size and sampling rate.
You can cheat so your web site seems faster than it is
You can make your web site seem faster without actually being faster.