Exploring sound with Wavelets
I have created a separate web page for this project... please go there.
I've been curious about wavelets since I did a course project on them.
The wavelet transform is similar to Fourier analysis, in that it figures out which frequencies exist in a given signal. The difference is that it adds another dimension to the data. From a 1-D waveform, you will get a 2-D picture. Each row is a frequency, and the columns are times. So you get a picture of how the frequency changes with time.
The DWT does speedup the wavelet transform greatly, and mathematically, no information is lost. However it is not a very good way to look at the data visually. From 1024 samples, you only get 10 frequency bands. There's no way to, for instance, distinguish individual notes in song. Here's an example of what you'd get from the DWT. Compare it to the result from the first image, and you see how much information is hidden!
Figure 1: Ten frequency bands from 512 samples. Where did all the information go???
Because of the DWT, very few people give the CWT (continuous wavelet transform) a second glance. The library is filled with books on wavelets that spend two pages on the CWT, and then talk for the rest of the book about applying the DWT. As a result, people think the DWT is all there is.
Another technique, called the the wavelet packet transform, gives you a little more detail. But at the end of it, if you have 1024 sound samples, you will have 1024 transformed points. The more times you perform the algorithm, the more detail you loose in time (and the image looks like a pixellated mess).
Continuous Wavelet TransformMy program applies the continuous wavelet transform to a wave file that you load in, and lets you zoom into see the individual frequencies that make up a sound. Give it a try!
One problem with it is that it generates a lot of data. Analyzing that sound took 170 MB of memory, and a couple of minutes on my computer. If you tried it on a 5 minute MP3 file, that's 5 times 60 seconds times 44100 samples per second * 44100/60 frequency bands = 9.7 billion data points, or about 38 GB of floating point data, if you don't use stereo!.
But it does produce some pretty pictures for short files. Here's a closeup view of the famous tada.wav:
Closeup on a small section of tada.wav
The majestic noise of the "c:windowsmediarecycle.wav" paper crumpling makes a great wallpaper.
How it works
- The program loads in a wave file using libsnd. If it is stereo or multichannel, the other channels are ignored and only the first channel is used.
- When you see "Rendering... 1%" on the screen, the program is busy calculating the wavelet transform. It first calculates some frequency scales, from 2 samples to sampleRate divided by 60 samples long, and goes through them logarithmically (eg. 2, 4, 8, 16 samples long).
- For each scaling factor, it creates a "real" and "complex" wavelet whose period is that many samples long. The wavelet we use is the cosine function multiplied by a gaussian (For the real part) and the imaginary part is the same thing, but with a sine function. This is known as the Morlet wavelet, and it is exceptionally good for sound analysis due to the sine and cosine basis.
- Once it has created the wavelets, it convolves the wavelet with the signal. Convolution is kind of like smearing one signal with another. To speed up the algorithm, I perform convolution by multiplying the fourier transforms of the signal and the wavelet. After the convolution, we end up with the strength of the wavelet in the signal at each point in time.
- The process is repeated for each scale level.
- Now we have real and complex data samples. The magnitude of the data samples are converted into a huge device independent bitmap in memory, so it can be displayed to the screen. I hope you have lots of RAM.
If I have time in the new year, I'm going to add some fun stuff:
- Drag and drop pitch shifting - This is not as easy as moving pixels on the image... first you have to do something called "phase unwrapping". My first cut at a phase unwrapping algorithm didn't work, so I'm trying to translate some fortran code from a 1981 paper I found. Does anybody have some C code for this???
- Boost/Reduce -- Draw a square with the mouse and boost or reduce the strength of that region. This could be great for manual noise elimination and sound retouching, or restoring that old copy of Brahms playing the piano.