< >

Exploring sound with Wavelets

Posted 17 years ago

Here's a program to create scalograms of sound files. Pictured below is the "windows xp startup sound". See how the individual frequencies have been isolated visually.

I have created a separate web page for this project... please go there.

Download Installer

I've been curious about wavelets since I did a course project on them.

The wavelet transform is similar to Fourier analysis, in that it figures out which frequencies exist in a given signal. The difference is that it adds another dimension to the data. From a 1-D waveform, you will get a 2-D picture. Each row is a frequency, and the columns are times. So you get a picture of how the frequency changes with time.

The DWT does speedup the wavelet transform greatly, and mathematically, no information is lost. However it is not a very good way to look at the data visually. From 1024 samples, you only get 10 frequency bands. There's no way to, for instance, distinguish individual notes in song. Here's an example of what you'd get from the DWT. Compare it to the result from the first image, and you see how much information is hidden!

Figure 1: Ten frequency bands from 512 samples. Where did all the information go???

Because of the DWT, very few people give the CWT (continuous wavelet transform) a second glance. The library is filled with books on wavelets that spend two pages on the CWT, and then talk for the rest of the book about applying the DWT. As a result, people think the DWT is all there is.

Another technique, called the the wavelet packet transform, gives you a little more detail. But at the end of it, if you have 1024 sound samples, you will have 1024 transformed points. The more times you perform the algorithm, the more detail you loose in time (and the image looks like a pixellated mess).

Continuous Wavelet Transform

My program applies the continuous wavelet transform to a wave file that you load in, and lets you zoom into see the individual frequencies that make up a sound. Give it a try!

One problem with it is that it generates a lot of data. Analyzing that sound took 170 MB of memory, and a couple of minutes on my computer. If you tried it on a 5 minute MP3 file, that's 5 times 60 seconds times 44100 samples per second * 44100/60 frequency bands = 9.7 billion data points, or about 38 GB of floating point data, if you don't use stereo!.

But it does produce some pretty pictures for short files. Here's a closeup view of the famous tada.wav:

Closeup on a small section of tada.wav

The majestic noise of the "c:windowsmediarecycle.wav" paper crumpling makes a great wallpaper.

How it works

The program loads in a wave file using libsnd. If it is stereo or multichannel, the other channels are ignored and only the first channel is used.
When you see "Rendering... 1%" on the screen, the program is busy calculating the wavelet transform. It first calculates some frequency scales, from 2 samples to sampleRate divided by 60 samples long, and goes through them logarithmically (eg. 2, 4, 8, 16 samples long).
For each scaling factor, it creates a "real" and "complex" wavelet whose period is that many samples long. The wavelet we use is the cosine function multiplied by a gaussian (For the real part) and the imaginary part is the same thing, but with a sine function. This is known as the Morlet wavelet, and it is exceptionally good for sound analysis due to the sine and cosine basis.
Once it has created the wavelets, it convolves the wavelet with the signal. Convolution is kind of like smearing one signal with another. To speed up the algorithm, I perform convolution by multiplying the fourier transforms of the signal and the wavelet. After the convolution, we end up with the strength of the wavelet in the signal at each point in time.
The process is repeated for each scale level.
Now we have real and complex data samples. The magnitude of the data samples are converted into a huge device independent bitmap in memory, so it can be displayed to the screen. I hope you have lots of RAM.

Future Work

If I have time in the new year, I'm going to add some fun stuff:

Drag and drop pitch shifting - This is not as easy as moving pixels on the image... first you have to do something called "phase unwrapping". My first cut at a phase unwrapping algorithm didn't work, so I'm trying to translate some fortran code from a 1981 paper I found. Does anybody have some C code for this???
Boost/Reduce -- Draw a square with the mouse and boost or reduce the strength of that region. This could be great for manual noise elimination and sound retouching, or restoring that old copy of Brahms playing the piano.

Steve Hanov makes a living working on Rhymebrain.com, rapt.ink, www.websequencediagrams.com, and Zwibbler.com. He lives in Waterloo, Canada.

Post comment

edit

Owen Hann

eleven years ago

Hi Steve,

You should check out "cooledit pro", at least version 1.2a, for the sort of stuff you mention above 'boost/reduce'. It has a really nifty method of reducing noise in audio.

I don't know much about wavelets - trying to learn. Are they similar to the "constant-Q transform"?

edit

Paddy Padmanabhan

15 years ago

Dear Steve,

Very interesting and useful work indeed. I landed on your link as I am researching the application of wavlets to pitch bending of music notes.

I can be reached on upaddy [AT] yahoo [DOT] com

I'd love to get connected with you.

Regards and best wishes.

Paddy.

Finding awesome developers in programming interviews

In a job interview, I once asked a very experienced embedded software developer to write a program that reverses a string and prints it on the screen. He struggled with this basic task. This man was awesome. Give him a bucket of spare parts, and he could build a robot and program it to navigate around the room. He had worked on satellites that are now in actual orbit. He could have coded circles around me. But the one thing that he had never, ever needed to do was: display something on the screen.

Usability Nightmare: Xfce Settings Manager

Rant: Why can't anyone make a good settings screen?

Let's read a Truetype font file from scratch

Walkthough of reading and interpretting a TrueType font file in a few lines of Javascript.

Exploiting perceptual colour difference for edge detection

Think colour isn't important in image processing algorithms? Let's try it both ways, and see for yourself.

Asana's shocking pricing practices, and how you can get away with it too

If one apple costs $1, how much would five apples cost? How about 500? If everyday life, when you buy more of something, you get more bananas for your buck. But software companies are bucking the trend.

Why don't web browsers do this?

Why don't web pages start as fast as this computer from 1984?

Pitching to VCs (comic)

5 Ways PowToon Made Me Want to Buy Their Software

Even though I saw through their tricks at every step along the way, I am now a customer and proud of it. It is worthwhile to look at what they did, because these are simple things that you can do to improve your software business.

When programmers design web sites (comic)

Stock Picking using Python

Python can tell you which stocks to buy. It's a sure thing!