< >

Exploring sound with Wavelets

Posted 16 years ago

Here's a program to create scalograms of sound files. Pictured below is the "windows xp startup sound". See how the individual frequencies have been isolated visually.

I have created a separate web page for this project... please go there.

Download Installer

I've been curious about wavelets since I did a course project on them.

The wavelet transform is similar to Fourier analysis, in that it figures out which frequencies exist in a given signal. The difference is that it adds another dimension to the data. From a 1-D waveform, you will get a 2-D picture. Each row is a frequency, and the columns are times. So you get a picture of how the frequency changes with time.

The DWT does speedup the wavelet transform greatly, and mathematically, no information is lost. However it is not a very good way to look at the data visually. From 1024 samples, you only get 10 frequency bands. There's no way to, for instance, distinguish individual notes in song. Here's an example of what you'd get from the DWT. Compare it to the result from the first image, and you see how much information is hidden!

Figure 1: Ten frequency bands from 512 samples. Where did all the information go???

Because of the DWT, very few people give the CWT (continuous wavelet transform) a second glance. The library is filled with books on wavelets that spend two pages on the CWT, and then talk for the rest of the book about applying the DWT. As a result, people think the DWT is all there is.

Another technique, called the the wavelet packet transform, gives you a little more detail. But at the end of it, if you have 1024 sound samples, you will have 1024 transformed points. The more times you perform the algorithm, the more detail you loose in time (and the image looks like a pixellated mess).

Continuous Wavelet Transform

My program applies the continuous wavelet transform to a wave file that you load in, and lets you zoom into see the individual frequencies that make up a sound. Give it a try!

One problem with it is that it generates a lot of data. Analyzing that sound took 170 MB of memory, and a couple of minutes on my computer. If you tried it on a 5 minute MP3 file, that's 5 times 60 seconds times 44100 samples per second * 44100/60 frequency bands = 9.7 billion data points, or about 38 GB of floating point data, if you don't use stereo!.

But it does produce some pretty pictures for short files. Here's a closeup view of the famous tada.wav:

Closeup on a small section of tada.wav

The majestic noise of the "c:windowsmediarecycle.wav" paper crumpling makes a great wallpaper.

How it works

The program loads in a wave file using libsnd. If it is stereo or multichannel, the other channels are ignored and only the first channel is used.
When you see "Rendering... 1%" on the screen, the program is busy calculating the wavelet transform. It first calculates some frequency scales, from 2 samples to sampleRate divided by 60 samples long, and goes through them logarithmically (eg. 2, 4, 8, 16 samples long).
For each scaling factor, it creates a "real" and "complex" wavelet whose period is that many samples long. The wavelet we use is the cosine function multiplied by a gaussian (For the real part) and the imaginary part is the same thing, but with a sine function. This is known as the Morlet wavelet, and it is exceptionally good for sound analysis due to the sine and cosine basis.
Once it has created the wavelets, it convolves the wavelet with the signal. Convolution is kind of like smearing one signal with another. To speed up the algorithm, I perform convolution by multiplying the fourier transforms of the signal and the wavelet. After the convolution, we end up with the strength of the wavelet in the signal at each point in time.
The process is repeated for each scale level.
Now we have real and complex data samples. The magnitude of the data samples are converted into a huge device independent bitmap in memory, so it can be displayed to the screen. I hope you have lots of RAM.

Future Work

If I have time in the new year, I'm going to add some fun stuff:

Drag and drop pitch shifting - This is not as easy as moving pixels on the image... first you have to do something called "phase unwrapping". My first cut at a phase unwrapping algorithm didn't work, so I'm trying to translate some fortran code from a 1981 paper I found. Does anybody have some C code for this???
Boost/Reduce -- Draw a square with the mouse and boost or reduce the strength of that region. This could be great for manual noise elimination and sound retouching, or restoring that old copy of Brahms playing the piano.

Steve Hanov makes a living working on Rhymebrain.com, PriceMonkey.ca, www.websequencediagrams.com, and Zwibbler.com. He lives in Waterloo, Canada.

Post comment

edit

Owen Hann

eleven years ago

Hi Steve,

You should check out "cooledit pro", at least version 1.2a, for the sort of stuff you mention above 'boost/reduce'. It has a really nifty method of reducing noise in audio.

I don't know much about wavelets - trying to learn. Are they similar to the "constant-Q transform"?

edit

Paddy Padmanabhan

14 years ago

Dear Steve,

Very interesting and useful work indeed. I landed on your link as I am researching the application of wavlets to pitch bending of music notes.

I can be reached on upaddy [AT] yahoo [DOT] com

I'd love to get connected with you.

Regards and best wishes.

Paddy.

The simple and obvious way to walk through a graph

At some point in your programming career you may have to go through a graph of items and process them all exactly once. If you keep following neighbours, the path might loop back on itself, so you need to keep track of which ones have been processed already.

My favourite Google Cardboard Apps

I have never been a gamer. The most I've played was Super Mario Bros (the original). I then took a break for a decade or two and spent a few weeks with Simcity 4. All that changed when I got Google Cardboard.

Zero load time file formats

When your app needs to be fast, you can't afford to load things fro disk. In this toy example, an on-disk data structure helps you instantly look up lists of related words.

You don't need a project/solution to use the VC++ debugger

You learn a lot of things on the job as a programmer. Years ago, at my first coop position, I was a little confused when my boss went to Visual C++, and tried to open the .EXE file as a project. What a dolt! I thought. That's not going to work.

Why you should go to the Business of Software Conference Next Year

Most people, having already paid $2000.00 of their hard earned money, and then having flown, driven, or otherwise travelled to Boston to attend a conference, and then having paid an additional $250/night plus $33/night parking and "tourism taxes" to the Seaport Hotel -- most people, after all this, are unlikely to say that it was a waste of time and they should have stayed home watching the remaining salvaged episodes of Doctor Who on Netflix.

In fact, I found it quite useful.

Pitching to VCs (comic)

Automatically remove wordiness from your writing

Pitching to VCs #2 (comic)

Finding great ideas for your startup

"I just don't have any ideas." This is the #1 stumbling block for budding entrepreneurs. Here are a few techniques to get the creative juices flowing.

Compressing dictionaries with a DAWG

A practical, memory efficient way to store and search large sets of words.