Deconvolution Part 1
A while back I did an assignment for a really fun subject. My assignment was going to be about audio deconvolution using cepstrum filtering. I ended up doing the assignment on something which might have been an idea of my own, but probably isn't, but is still a cool idea. Basically, it is one of those ideas you have which no one told you about, but it isn't so crazily complicated that it is infeasible for someone to come up with the idea.
After writing most of this, I realised that it was very long, so I'm breaking it up into parts. It will end up as either 2 or 3 parts I think.
Deconvolution
Suppose you are a musical purist who likes to pretend you can hear things that aren't there, buys one directional cables made from the trace metal in African swallows, wooden volume knobs from the remote Amazon and leads that have been subjected to zero-gravity and buried for 100 years in order to achieve the best sound quality. You come to the realisation that the only way you can get the true sound of your favourite band is to have them in your own house. So you throw out all your audio gear and pay the band to live at your house and whenever you want music, they will play it for you. Lets suppose you also have a bathroom which is 40 metres long, 4 metres wide and 4 metres high. Every surface in your bathroom is covered in hard tiles. Now imagine that you are at one end having a shower, while the band (wishing to respect your privacy) has set themselves up at the other end of the bathroom.
Ok, so this whole musical purist thing is too hard to keep up. Basically, the sounds you hear are never the original sounds as produced by whatever instruments are making them, but they are those original sounds plus lots of little echos (and in the 40m bathroom case, it would be particularly bad). Deconvolution is the method used for separating the original sound from the echos of it. The echos can be characterised as a list of times when the echos happen, and how loud they are. e.g. "an echo 0ms after the original at 90% volume, an echo 1ms after the original at 50% volume, an echo 4ms after at 10% volume, etc". It is a bit easier to view this as another signal, where the amplitude at a certain time is how much echo there is after that time. For example:

This is a very simple case where we get 90% of the original signal (at t=0ms), an echo after 1ms at 50% volume, and 4ms later a faint 10% echo. Three things to note:
- This particular example is highly unlikely, since the total energy is greater than what was put in (90% + 50% + 10% = 150%).
- I've made it discrete - I don't think of time as a continuous thing because I'm a computer scientist. It is just the way we are.
- The echo at 0ms isn't really an echo - it is just the original sound which is coming straight to us without bouncing off any walls or anything like that. It is just easier to think about it as an echo.
For a real life example, usually the very start of the convolution has a few identifiable peaks corresponding to early reflections (things where the sound has hit only one or two walls before getting to you) and then a horrible mess after that which tapers out (corresponding to all the sounds which bounced forwards and backwards off the front wall and back wall several times before finally hitting a pot plant and making their way to you).
Deconvolution is a problem which is impossible to solve. Unfortunately, for any sound you have there are an infinite number of pairs of "original sound" and "echos" that you can have which make this sound. It is a bit like if we were trying to find out which numbers were multiplied together to get 15. It could have been 1 and 15, 2 and 7.5, 3 and 5, etc. However, although it is impossible, we can still get results which are mostly correct most of the time. The idea is that we say which pairs are more likely to happen. In our 15 example, maybe we know that the numbers are probably whole numbers, and probably near each other - then the original numbers might have been 3 and 5 (or 5 and 3). Sure, they might have been 1 and 15, but when you are doing the impossible you sometimes have to make some compromises.
Convolution
Convolution is the opposite of deconvolution - you take a signal and a list of echos, and you give a new signal with all of these echos applied to it. The most straightforward way of doing this is to take the signal, copy it once for each echo, offset it by that echo, scale it by the echo's volume and then mix all of these signals together. This means that the time taken is proportional to the number of echos multiplied by the length of the signal (since for each signal we need to scale the volume with each different echo). For the size of numbers we are dealing with, this often turns out to be a bit too slow. Fortunately there is a method of speeding it up significantly so it is nearly proportional to just the length of the signal (assuming the signal is longer than the list of echos).
The technique uses Fourier Transforms to do its magic, and it isn't intuitively obvious why it should work either. This seems to be a common theme with things to do with Fourier Transforms. Basically a Fourier Transform takes a signal and tells us what frequency components it is made up of. I will hopefully write about Fourier transforms in more detail later. But the short story is that if you take the Fourier transform of the signal and the convolution, then multiply together the parts that correspond to each other (so if the original signal had a 50hz frequency with amplitude 5, and the convolution had a 50hz frequency with amplitude 2, then the resulting signal has a 50hz frequency with amplitude 10). Then invert the Fourier Transform (which is just the opposite of a Fourier Transform - building a signal from frequency components) then the result is the signal convolved with the convolution. You will have to take my word for it at the moment, but don't take my word too strongly, I've used a bit of mathematician's license with the description, and I have a feeling that I was supposed to reverse one of the signals too, but those are just details.
Convolution is very similar to multiplication like you do at school - you have two numbers (one representing the signal, one representing the echos) and each pair of digits is multiplied together in some order and mixed together (the final step where you add all the numbers together). The main difference is that there is no carrying:

Compare with convolution: (convolving 7123 with 2181)

The two calculations are pretty much the same, and if you mind is warped enough then they have the same answer, just the convolved form has slightly more information in it. So from this, you can see my justification for why deconvolution is impossible. If I tell you {14, 9, 61, 23, 20, 26, 3} and ask you what numbers were convolved to give this answer, then you will have a bit of trouble doing so.

RSS