Thanks very much mots,
In short, method 2 is exactly the approach I was hoping somebody would come up with. In my final application I do quite a bit more stuff with the microphone data before I do an inverse FFT though. Before you say that the sound is crap, I should give away the secret and explain that I'm trying to build a Vocoder!
Basically it works like this: Microphone data comes in and an FFT is applied. Let's call that input A, the modulator. It turns out that most of the features of human language (vowel sounds, consonants, etc) have a pretty unique frequency spectrum shaped mostly by the mouth and tongue. We would like to encode this information into a carrier signal to create a kind of singing robot voice.
There is a second audio signal which is the output of a synthesizer (hopefully rich in harmonic content). Let's call that input B, the carrier. In my implementation, this synthesizer is also created in Minim using a set of SawWave generators. By using the computer keyboard you can decide the musical root note and chord structure (maybe an A major, or F minor 7th, etc.). Then we take the FFT of input B to get it's spectrum also.
Now if we scale the spectrum of the carrier by the spectrum of the modulator, the result is a spectrum that has all of the pitch and chord information from the synthesizer but is encoded by the consonant/vowel information from the microphone. All that remains is to take the inverse FFT of the scaled signal and put it out to the speakers. Singing Robot!
Whenever I'm finished developing the whole application, I will append it to this thread.
I do have a few comments about your code:
- The new Listener and MySignal classes are the bridges that I was missing. These handle the refilling of the audio output buffers properly without all of the clicking and popping I was getting using a more naieve approach.
- The input to your fft.forward call is an array of REAL numbers that represent samples with respect to time indices. The output of fft.forward is generally an array of COMPLEX numbers that represent frequency content with respect to frequency indices (sometimes called bins). If you look at the Minim source code though, you will see that fft.forward with one argument actually returns the MAGNITUDE of those complex numbers using sqrt(REAL2 + IMAG2). This does throw away the phase information (which is like time alignment) as you say.
- In the MySignal() class, generate() function: I don't think the for() { out[i] = out[i]; } loop is needed at the end. fft2.inverse(out) writes directly to the buffer called out.
Thanks again for the help, and stay tuned for the final product!
Adam