We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSound,  Music Libraries › Minim audio video sync
Page Index Toggle Pages: 1
Minim audio video sync (Read 3957 times)
Minim audio video sync
Mar 30th, 2008, 2:57pm
 
Hi, just wondering if anyone has come across this before...

I've made some audio responsive visuals using Minim and now want to export to video using save() method.  
Because the visuals are quite heavy I'm losing a lot of frames.

Is there a way with Minim to seek to a certain position and get the fft data at that point in the song?
Re: Minim audio video sync
Reply #1 - Apr 1st, 2008, 3:29am
 
There is currently not a terribly good way to do this. But there are other people on the board who are doing this kind of audio/visual sync and hopefully they will chime in with their solutions.
Re: Minim audio video sync
Reply #2 - Apr 1st, 2008, 2:26pm
 
yeh i've read about solutions that first output audio data to a file and then read it back in again on a per frame basis.

also ive read that ESS has the capability to return fft data for any given time during a song, although i've had too many problems with ESS and followed the masses to Minim.

hopefully someone Minim users might have an answer.
Re: Minim audio video sync
Reply #3 - Apr 2nd, 2008, 10:31pm
 
Its worth using ESS for this very feature..  I haven't had any problems with it either.  Give it a try.  

The other solution would be to FFT the audio data yourself.   Retreive 1470 samples at a time (1 frame's worth for 44.1 khz at 30FPS) and FFT it with a 2048 bin FFT (Because it must be a power of two.  The extra slots can be zero, it won't affect the FFT.  This is called Zero-padding.) I have general FFT code for processing if you want it.
Re: Minim audio video sync
Reply #4 - Apr 3rd, 2008, 5:27am
 
yes definitely, that sounds very interesting, send it please.

in the mean time i'll give ESS another go.

cheers movax.
Re: Minim audio video sync
Reply #5 - Apr 3rd, 2008, 6:39am
 
Ah, but I have another perspective! There are numerous benefits to the Minim solutions:

1.) Scan the entire audio to a spectrum file (make up your own file format, fairly simple to do). These really don't actually get as big as you think! 60 channels / frame, I got a 10 meg file (my file format was really verbose) for a 3:00 song.

2.) Scan ahead in, say, 5 second intervals (pausing the track, playing, reading fft as it plays, pausing the track, rendering occuring to the data you gathered over those 5 seconds). I like the file better, but this is also doable.

Now, for most purposes you may appreciate the simplicity of just using Ess. But, if you make this spectrum data file you have added capabilities:

bi-directional smoothing: You can 'smooth' the fft results in both directions! So, if you think a drumbeat is affecting your visualization too much, you can look ahead and anticipate future changes! There's so much potential here. I really reccomend not looking for the "get a sample, decode the fft, and render" because the file format will give you more options and in the end I think it will save you rendering time, though its not a big deal.

Just my opinion, as I have made a few FFT music videos in the past few months.
Re: Minim audio video sync
Reply #6 - Apr 3rd, 2008, 7:46pm
 
FFT code, in a minim example:

Code:

// Alternate FFT method for audio
// movax : Kristopher Collins
// RGB.nu


import ddf.minim.*;

AudioInput in;

float []fftdata=new float[512];
float []newfftdata=new float[512];

void setup()
{
size(256, 200, P3D);
Minim.start(this); // always start Minim before you do anything with it
in = Minim.getLineIn(Minim.STEREO, 512, 44100); // get a line out from Minim, default bit depth is 16

}

void draw()
{
background(0);
stroke(255);

for(int i = 0; i < in.bufferSize() - 1; i++) newfftdata[i]=in.left.get(i); //Get audio data into array

audioFFT(newfftdata); //In-place FFT of data

for(int i = 0; i < in.bufferSize()/2 - 1; i++) //We only draw 256 bins, because the FFT is always 1/2 the size of the buffer
{
if(newfftdata[i]>fftdata[i]) fftdata[i]=newfftdata[i]; //If the bin has increased, show it
else fftdata[i]*=.95; // Otherwise, damp it down
line(i,height-5,i,height-5-abs(fftdata[i]*1500.)); //draw the spikes
}
}


void stop()
{
// always close Minim audio classes when you are done with them
in.close();
// always stop Minim before exiting
Minim.stop();
super.stop();
}



void audioFFT( float []values)
{
//values[] ARRAY SIZE MUST BE A POWER OF TWO - 256,512,1024, etc


int n,i,i1,j,k,i2,l,l1,l2,m;
float c1,c2,tx,t1,t2,u1,u2,z;

n=values.length;
m=int(log(float(n))/log(2));

// Do the bit reversal
i2 = n >> 1;
j = 0;
for (i=0;i<n-1;i++) {
if (i < j) {
tx = values[i];
values[i] = values[j];
values[j] = tx;
}
k = i2;
while (k <= j) {
j -= k;
k >>= 1;
}
j += k;
}

// Compute the FFT
c1 = -1.0;
c2 = 0.0;
l2 = 1;
for (l=0;l<m;l++) {
l1 = l2;
l2 <<= 1;
u1 = 1.0;
u2 = 0.0;
for (j=0;j<l1;j++) {
for (i=j;i<n;i+=l2) {
i1 = i + l1;
t1 = u1 * values[i1];
t2 = u2 * values[i1];
values[i1] = values[i] - t1;
values[i] += t1;
}
z = u1 * c1 - u2 * c2;
u2 = u1 * c2 + u2 * c1;
u1 = z;
}
c2 = -sqrt((1.0 - c1) / 2.0);
c1 = sqrt((1.0 + c1) / 2.0);
}

for (i=0;i<n;i++) values[i] = values[i]/n*pow(1.6,(i/50.))*3.; // Exponential Scaling. Adjust values as desired.

}



The spectrum file is a cool idea also.
Re: Minim audio video sync
Reply #7 - Apr 3rd, 2008, 11:47pm
 
That code looks great, but how are you manually advancing AudioInput forward? The code looks like you're just constantly re-reading the input's current buffer... and actually, it looks like you're trying to analyze of off the microphone input stream O.o

How are you "pushing" say, a .wav file's sound buffer into AudioInput in?
Re: Minim audio video sync
Reply #8 - Apr 4th, 2008, 12:19am
 
That example was designed to use the audio input.  Not exactly what you wanted, but it does demonstrate the FFT funciton.

I don't use Minim but you would just retrieve the 1470 samples at a time with whatever Minim function will read sample-data of a .wav.
Re: Minim audio video sync
Reply #9 - Apr 8th, 2008, 2:26am
 
thx for that code mavax! i like the fact you can scale your fft result the way you see fit, definitely come in handy when needing to pronounce a certain range in the spectrum.

before your post i actually went down the path of exporting the fft data into a text file in a seperate sketch and reading in those values during rendering. its worked as far as i can see, there was a little lag but easily adjusted by shifting up  the audio layer over the video layer. still need to test this technique over a long audio file to make sure it doesn't lag any further.

think your approach is more stable, going try plugging it in into minim.

thx again.
Re: Minim audio video sync
Reply #10 - Apr 15th, 2008, 9:26pm
 
i'd switch to minim (over ESS) if it weren't for this one (glaring) omission.  i prefer minim's fft implementation, and would rather not reinvent yet another, but require the ability to do it non-real-time (which is really easy with ESS, but impossible with minim) because i can't stand losing sync because an approximate video frame rate!

so what's a guy to do?  er, use both??  why not???  ;)

this isn't 100% complete-ready-to-run code, but the pieces are all there, hope it helps:

Quote:


/** rough framework for doing non-realtime
   fft analysis with ESS/Minim hybrid
*/

import krister.Ess.*;
import ddf.minim.analysis.*;
import ddf.minim.*;

AudioChannel audio; // use ESS to load the full audio file
int fftBufferSize = 1024; // define this explicitly
ddf.minim.analysis.FFT fft; // fully qualify to resolve conflict

// here's where the translation will occur:
// Minim requires input buffer to fft be exactly right size,
// so this will contain a "snippet" from the full Ess AudioChannel
// at the desired cue point per frame:
float [] tempBuffer;

void setup() {
 //...
 // start Ess, don't start Minim
 Ess.start(this);
 // Ess creates the audio file
 audio = new AudioChannel(dataPath("jingle.mp3"));
 // Minim creates the fft
 fft = new ddf.minim.analysis.FFT(fftBufferSize, myChannel.sampleRate);
 // we create the translation buffer
 tempBuffer = new float[fftBufferSize];
 //...
}

void stop() {
 Ess.stop();
 // don't need to stop Minim, never started
 super.stop();
}

void draw() {
 // define your cue point (in samples) into the audio data
 // (typically something like: frameNumber/framesPerSecond*samplesPerSeond)
 int cuepoint = 12345;
 // transfer relevant sample data to translation buffer
 arraycopy(audio.samples, cuepoint, tempBuffer, 0, fftBufferSize);
 // do the analysis:
 fft.forward(tempBuffer);
 // then use the fft results just as you would otherwise...
}

Chime
Reply #11 - Oct 21st, 2008, 4:43pm
 
I have found this whole thread really interesting and the non real time mapping of audio analysis data to an anim is something I am experimenting with.

However - processing and audio are areas I have limited knowledge developing with and am only starting to play so I would really appreciate feedback on the following script I developed based on what i have read and learnt.

I still need to test synching anims with the data file.

Code:

/////////////////////////////////////////////////////////////////////
/**
* FFT Logarithmic Averages to TextFile
* by gLASSHOPPER*
*
* It is an experiment - updated for cleaner logging!!!
*/
/////////////////////////////////////////////////////////////////////

/////////////////////////////////////////////////////////////////////
// VARIABLES //
import ddf.minim.analysis.*;
import ddf.minim.*;

Minim minim;
AudioPlayer audio;
AudioMetaData meta;
FFT fft;
PrintWriter output;

int yi = 15;

int tFrameRate = 25;
String audioFileName = "test.mp3";

int setupMilli; int playMilli; int playSecs;
int curFrame = 0; int logFrame = 0; String audioLog = "";

int minHz = 20;
int maxHz = 500;
int bands = 2;

int durSecs; int durMilli;

float[] arrSpectrum;


/////////////////////////////////////////////////////////////////////
// MAIN //

// SETUP //
void setup() {
size(512, 200, P3D);
minim = new Minim(this);

textFont( loadFont("Helvetica.vlw") );
textMode(SCREEN);

audio = minim.loadFile(audioFileName, 2048);

// get metadata for audio
meta = audio.getMetaData();

// create FFT object
fft = new FFT(audio.bufferSize(), audio.sampleRate());

// Sets the window to use on the samples
// before taking the forward transform.
fft.window(FFT.HAMMING);

// calculate averages based on a miminum octave width of minHz
// split each octave into bands
fft.logAverages(minHz, bands);

// setup arrSpectrum to contain each avg value from spectrum
arrSpectrum = new float[fft.avgSize()];
rectMode(CORNERS);

// Create a new output file in the sketch directory
output = createWriter(meta.fileName()+".txt");

// get track dur from metadata associated with file
durSecs = meta.length()/1000;
durMilli = meta.length()-(meta.length()/1000)*1000;

// log the amount of time setup took to complete
println("Setup Complete: "+millis());
println("File Name: "+meta.fileName()+" Length: "+durSecs+":"+durMilli);
setupMilli = millis();
// play the file
audio.play();
}


// DRAW //
void draw() {
background(0);
fill(255);

// reset text pos to yi
int y = yi;

// perform a forward FFT on the samples in audio's mix buffer
fft.forward(audio.mix);
fft.calcAvg(minHz, maxHz);
int w = int(width/fft.avgSize());


// draw a rectangle for each average
for(int i = 0; i < fft.avgSize(); i++) {
rect(i*w, height, i*w + w, height - fft.getAvg(i)*5);

// update the average value in array
arrSpectrum[i] = round(fft.getAvg(i));
}

// update playMilli to current play duration minus setupMilli
playMilli = millis() - setupMilli;
playSecs = (playMilli/1000);

// remap milliseconds for current second between 1 + tFrameRate
curFrame = round(map((playMilli-(1000*playSecs)),0,1000,1,tFrameRate));

// if curFrame is not the same as last logged frame call logger
// ensure we log only based on the framerate
if(curFrame != logFrame) { logSpectrum(); }

// update text output so user can sse what is happening
text("File Name: " + meta.fileName(), 5, y);
text("Length: " + durSecs+":"+durMilli, 5, y+=yi);
text("Timer: " + (playSecs)+":"+curFrame, 5, y+=yi);

// exit case if we are beyond the duration of the track
if (millis()/1000 > meta.length()/1000) {
end();
}
}


// STOP // final output to console and EXIT ...
void stop() {
println("DONE: "+(playMilli/1000)+":"+curFrame);
exit();
}


/////////////////////////////////////////////////////////////////////
// FUNCTIONS //

// LOGSPECTRUM // logs spectrum data at current frame to audioLog
void logSpectrum() {
logFrame = curFrame;
String joinedSpectrum = join(nf(arrSpectrum, 3, 0), ",")+"\n";
audioLog += nf(playSecs,2)+":"+nf(curFrame,2)+"\t"+joinedSpectrum;
//println(audioLog);
}


// END // cleans up minim classes write log file + forces close
void end() {
// always close Minim audio classes when you finish with them
audio.close();
minim.stop();

println(audioLog);

//String logLines new String[lo
saveStrings(meta.fileName()+".txt", split(audioLog,"\n"));

super.stop();
}

Verbosity
Reply #12 - Oct 23rd, 2008, 4:17pm
 
I am realising that my code above is a little bit verbose and am in the process of wrapping the logger into a class for logging and a class for analysing the log. I also realised I have re-invented the wheel in some places and have some bugs in the code listing above. Where is the best place to post code for the community???

Once tested as a movie this technique works adequately - the next trick is to work out which visual is most elegantly animated by which audio peak.
Re: Minim audio video sync
Reply #13 - Oct 23rd, 2008, 7:06pm
 
I've done a fair amount of messing around with offline rendering and sync issues.  I've been meaning to post back here in the forum (since i don't blog) in case anyone else might benefit, so here are some further thoughts, fwiw...

There are 3 (4?) general approaches to capture sync'd audio/video:

1A) do everything real-time, play audio, draw stuff, save video frames to disk, merge audio after-the-fact
1B) do everything real-time, play audio, draw stuff, record both simultaneously with a camera (or external software)
2) do the audio analysis real-time, log the results, but render video offline by reading the log, merge audio after-the-fact
3) do everything offline, fft the audio one video frame at a time, render that frame & save it, merge audio after-the-fact

Method 1A is practically guaranteed to be laggy and loose audio sync, due to the inexact video framerate of Processing. (and, of course, if you have really complex visuals that further drop the requested video framerate, then it's just that much worse)

Method 1B suffers the same video lag as 1A, but due to the realtime recording will make up for it when played back.  (camera will simply record multiple frames of a single rendered frame as necessary until you get around to rendering the next one)  So that even if you can't render *every* video frame at least the frames you do render are recorded synchronously with the audio currently playing.  (this is essentially just recording a live performance, a non-issue as far as i'm concerned, it works, done)

Method 2 will likely have minor lags, again due to the inexact video framerate of Processing (versus the audio samplerate which is *exact*).  So on video frame N you can't be sure that audio is exactly playing at N*samplerate/framerate, so your logged results will have lag built into them.

Method 3 has no lag and can attain "perfect" sync.

Consider if you had an *exact* 30fps video rate:  playing a 44.1Hz audio requires that 1470 (44100/30) samples are analyzed during each video frame.  On frame #1, you cue to sample #0 and fft the next 1470 samples.  On frame #2, you cue to sample #1470 and fft the next 1470.  On frame #3, you cue to sample #2940 and fft the next 1470.  et cetera till end of audio.

(technical aside: in order to accomplish an fft of 1470 samples, you'd need to zero pad a buffer of size 2048)

so, if you had audio exactly 60 seconds long, you'd do exactly 1800 fft's (60*30) spanning exactly 79380000 samples (1800*44100) and render exactly 1800 video frames, each of which "looked at" an fft of exactly 1470 samples.  That's the sort of math you'd like to see for "perfect" sync.

btw, that's a good test of any rendering method - create an audio snippet of exactly 60 seconds, process it, then ask:  did i render exactly 1800 video frames? or log exactly 1800 fft frames?  if not, then you'll loose sync to some degree when you eventually merge the audio with video.

Basically, if you do your FFT based on audio that's *playing* realtime (even if logging it for later rendering), there's really no way using the draw() thread to capture perfectly sync'd audio/video (other than some variant of method 1B with an external live recorder).  A better approach for method 2 "logging" variants would be to start a separate thread, and so doing the fft asynchronously with the draw() thread.  That should get you a lot closer, use the sleep(long,int) version to get 1000/30=33 millis + 333333 nanos. (and i'd still check it, did it log exactly 1800 fft frames on your 60 second test audio?)

Method 3 gets around all of that by never actually playing the audio, just fft-ing the raw wave data at specific cue points that match up exactly with video frames.  And it's really, really easy once you "get" how it works.  I've done pieces >15 minutes that retained perfect frame-accurate sync throughout - hard to do with other methods!  (or, at least, it was hard for me!! :-D)  Hope that helps.
Re: Minim audio video sync
Reply #14 - Nov 2nd, 2008, 10:11pm
 
This is really useful - tx ... as I understand it though - Method 3 is a perfect situation - but is not possible with Minim because it cannot seek to a cue point. Did you succeed with Method 3 using ESS and Sonia? If so are you only analysing mono files???
Page Index Toggle Pages: 1