Processing 1.0 - Processing Discourse - Minim audio video sync

We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.

Index › Programming Questions & Help › Sound, Music Libraries › Minim audio video sync

‹ Previous Topic | Next Topic ›

Pages: 1

Minim audio video sync (Read 3957 times)

julapy YaBB Newbies Offline Posts: 45	Minim audio video sync Mar 30^th, 2008, 2:57pm Hi, just wondering if anyone has come across this before... I've made some audio responsive visuals using Minim and now want to export to video using save() method. Because the visuals are quite heavy I'm losing a lot of frames. Is there a way with Minim to seek to a certain position and get the fft data at that point in the song?
	« Last Edit: Mar 31^st, 2008, 12:35am by julapy »

ddf Full Member Offline Posts: 241	Re: Minim audio video sync Reply #1 - Apr 1^st, 2008, 3:29am There is currently not a terribly good way to do this. But there are other people on the board who are doing this kind of audio/visual sync and hopefully they will chime in with their solutions.
	http://www.compartmental.net

julapy

Re: Minim audio video sync
Reply #2 - Apr 1^st, 2008, 2:26pm

yeh i've read about solutions that first output audio data to a file and then read it back in again on a per frame basis.

also ive read that ESS has the capability to return fft data for any given time during a song, although i've had too many problems with ESS and followed the masses to Minim.

hopefully someone Minim users might have an answer.

movax

Re: Minim audio video sync
Reply #3 - Apr 2^nd, 2008, 10:31pm

Its worth using ESS for this very feature.. I haven't had any problems with it either. Give it a try.

The other solution would be to FFT the audio data yourself. Retreive 1470 samples at a time (1 frame's worth for 44.1 khz at 30FPS) and FFT it with a 2048 bin FFT (Because it must be a power of two. The extra slots can be zero, it won't affect the FFT. This is called Zero-padding.) I have general FFT code for processing if you want it.

Flickr Stream of mostly Processing work:&&http://www.flickr.com/photos/87132003@N00/&&

julapy YaBB Newbies Offline Posts: 45	Re: Minim audio video sync Reply #4 - Apr 3^rd, 2008, 5:27am yes definitely, that sounds very interesting, send it please. in the mean time i'll give ESS another go. cheers movax.

taifunbrowser

Re: Minim audio video sync
Reply #5 - Apr 3^rd, 2008, 6:39am

Ah, but I have another perspective! There are numerous benefits to the Minim solutions:

1.) Scan the entire audio to a spectrum file (make up your own file format, fairly simple to do). These really don't actually get as big as you think! 60 channels / frame, I got a 10 meg file (my file format was really verbose) for a 3:00 song.

2.) Scan ahead in, say, 5 second intervals (pausing the track, playing, reading fft as it plays, pausing the track, rendering occuring to the data you gathered over those 5 seconds). I like the file better, but this is also doable.

Now, for most purposes you may appreciate the simplicity of just using Ess. But, if you make this spectrum data file you have added capabilities:

bi-directional smoothing: You can 'smooth' the fft results in both directions! So, if you think a drumbeat is affecting your visualization too much, you can look ahead and anticipate future changes! There's so much potential here. I really reccomend not looking for the "get a sample, decode the fft, and render" because the file format will give you more options and in the end I think it will save you rendering time, though its not a big deal.

Just my opinion, as I have made a few FFT music videos in the past few months.

movax

Re: Minim audio video sync
Reply #6 - Apr 3^rd, 2008, 7:46pm

FFT code, in a minim example:

Code:


// Alternate FFT method for audio 
// movax : Kristopher Collins
// RGB.nu


import ddf.minim.*;
 
AudioInput in;
 
float []fftdata=new float[512];
float []newfftdata=new float[512];
 
void setup()
{
  size(256, 200, P3D);
  Minim.start(this);   // always start Minim before you do anything with it
  in = Minim.getLineIn(Minim.STEREO, 512, 44100);   // get a line out from Minim, default bit depth is 16

}
 
void draw()
{
  background(0);
  stroke(255);

  for(int i = 0; i < in.bufferSize() - 1; i++)  newfftdata[i]=in.left.get(i);    //Get audio data into array

  audioFFT(newfftdata);	  //In-place FFT of data

  for(int i = 0; i < in.bufferSize()/2 - 1; i++)  //We only draw 256 bins, because the FFT is always 1/2 the size of the buffer
  {
    if(newfftdata[i]>fftdata[i]) fftdata[i]=newfftdata[i];	//If the bin has increased, show it
    else fftdata[i]*=.95;						   // Otherwise, damp it down
    line(i,height-5,i,height-5-abs(fftdata[i]*1500.));		  //draw the spikes
  }  
}
 
 
void stop()
{
  // always close Minim audio classes when you are done with them
  in.close();
  // always stop Minim before exiting
  Minim.stop();
  super.stop();
}



void audioFFT( float []values)
{
//values[] ARRAY SIZE MUST BE A POWER OF TWO - 256,512,1024, etc
  
 
 int n,i,i1,j,k,i2,l,l1,l2,m;
 float c1,c2,tx,t1,t2,u1,u2,z;

n=values.length;
m=int(log(float(n))/log(2));

 // Do the bit reversal 
 i2 = n >> 1;
 j = 0;
 for (i=0;i<n-1;i++) {
    if (i < j) {
	 tx = values[i];
	 values[i] = values[j];
	 values[j] = tx;
    }
    k = i2;
    while (k <= j) {
	 j -= k;
	 k >>= 1;
    }
    j += k;
 }

 // Compute the FFT 
 c1 = -1.0;
 c2 = 0.0;
 l2 = 1;
 for (l=0;l<m;l++) {
    l1 = l2;
    l2 <<= 1;
    u1 = 1.0;
    u2 = 0.0;
    for (j=0;j<l1;j++) {
	 for (i=j;i<n;i+=l2) {
	    i1 = i + l1;
	    t1 = u1 * values[i1];
	    t2 = u2 * values[i1];
	    values[i1] = values[i] - t1;
	    values[i] += t1;
	 }
	 z =  u1 * c1 - u2 * c2;
	 u2 = u1 * c2 + u2 * c1;
	 u1 = z;
    }
    c2 = -sqrt((1.0 - c1) / 2.0);
    c1 = sqrt((1.0 + c1) / 2.0);
 }

   for (i=0;i<n;i++)   values[i] = values[i]/n*pow(1.6,(i/50.))*3.;   // Exponential Scaling.  Adjust values as desired.

}

The spectrum file is a cool idea also.

Flickr Stream of mostly Processing work:&&http://www.flickr.com/photos/87132003@N00/&&

taifunbrowser Junior Member Offline Posts: 81	Re: Minim audio video sync Reply #7 - Apr 3^rd, 2008, 11:47pm That code looks great, but how are you manually advancing AudioInput forward? The code looks like you're just constantly re-reading the input's current buffer... and actually, it looks like you're trying to analyze of off the microphone input stream O.o How are you "pushing" say, a .wav file's sound buffer into AudioInput in?

movax Full Member Offline Posts: 172 Colorado	Re: Minim audio video sync Reply #8 - Apr 4^th, 2008, 12:19am That example was designed to use the audio input. Not exactly what you wanted, but it does demonstrate the FFT funciton. I don't use Minim but you would just retrieve the 1470 samples at a time with whatever Minim function will read sample-data of a .wav.
	Flickr Stream of mostly Processing work:&&http://www.flickr.com/photos/87132003@N00/&&

julapy

Re: Minim audio video sync
Reply #9 - Apr 8^th, 2008, 2:26am

thx for that code mavax! i like the fact you can scale your fft result the way you see fit, definitely come in handy when needing to pronounce a certain range in the spectrum.

before your post i actually went down the path of exporting the fft data into a text file in a seperate sketch and reading in those values during rendering. its worked as far as i can see, there was a little lag but easily adjusted by shifting up the audio layer over the video layer. still need to test this technique over a long audio file to make sure it doesn't lag any further.

think your approach is more stable, going try plugging it in into minim.

thx again.

davbol

Re: Minim audio video sync
Reply #10 - Apr 15^th, 2008, 9:26pm

i'd switch to minim (over ESS) if it weren't for this one (glaring) omission. i prefer minim's fft implementation, and would rather not reinvent yet another, but require the ability to do it non-real-time (which is really easy with ESS, but impossible with minim) because i can't stand losing sync because an approximate video frame rate!

so what's a guy to do? er, use both?? why not??? ;)

this isn't 100% complete-ready-to-run code, but the pieces are all there, hope it helps:

Quote:

/** rough framework for doing non-realtime
fft analysis with ESS/Minim hybrid
*/

import krister.Ess.*;
import ddf.minim.analysis.*;
import ddf.minim.*;

AudioChannel audio; // use ESS to load the full audio file
int fftBufferSize = 1024; // define this explicitly
ddf.minim.analysis.FFT fft; // fully qualify to resolve conflict

// here's where the translation will occur:
// Minim requires input buffer to fft be exactly right size,
// so this will contain a "snippet" from the full Ess AudioChannel
// at the desired cue point per frame:
float [] tempBuffer;

void setup() {
//...
// start Ess, don't start Minim
Ess.start(this);
// Ess creates the audio file
audio = new AudioChannel(dataPath("jingle.mp3"));
// Minim creates the fft
fft = new ddf.minim.analysis.FFT(fftBufferSize, myChannel.sampleRate);
// we create the translation buffer
tempBuffer = new float[fftBufferSize];
//...
}

void stop() {
Ess.stop();
// don't need to stop Minim, never started
super.stop();
}

void draw() {
// define your cue point (in samples) into the audio data
// (typically something like: frameNumber/framesPerSecond*samplesPerSeond)
int cuepoint = 12345;
// transfer relevant sample data to translation buffer
arraycopy(audio.samples, cuepoint, tempBuffer, 0, fftBufferSize);
// do the analysis:
fft.forward(tempBuffer);
// then use the fft results just as you would otherwise...
}

glasshopper+

Chime
Reply #11 - Oct 21^st, 2008, 4:43pm

I have found this whole thread really interesting and the non real time mapping of audio analysis data to an anim is something I am experimenting with.

However - processing and audio are areas I have limited knowledge developing with and am only starting to play so I would really appreciate feedback on the following script I developed based on what i have read and learnt.

I still need to test synching anims with the data file.

Code:


/////////////////////////////////////////////////////////////////////
/**
 * FFT Logarithmic Averages to TextFile
 * by gLASSHOPPER*
 *  
 * It is an experiment - updated for cleaner logging!!!
 */
/////////////////////////////////////////////////////////////////////  
 
/////////////////////////////////////////////////////////////////////
// VARIABLES //
import ddf.minim.analysis.*;
import ddf.minim.*;
 
Minim minim;
AudioPlayer audio;
AudioMetaData meta;
FFT fft;
PrintWriter output;
 
int yi = 15;
 
int tFrameRate = 25;
String audioFileName = "test.mp3";
 
int setupMilli; int playMilli; int playSecs;
int curFrame = 0; int logFrame = 0; String audioLog = "";
 
int minHz = 20;
int maxHz = 500;
int bands = 2;
 
int durSecs; int durMilli;
 
float[] arrSpectrum;
 
 
/////////////////////////////////////////////////////////////////////
// MAIN //
 
// SETUP //
void setup() {
  size(512, 200, P3D);
  minim = new Minim(this);
 
  textFont( loadFont("Helvetica.vlw") );
  textMode(SCREEN);
 
  audio = minim.loadFile(audioFileName, 2048);
   
  // get metadata for audio
  meta = audio.getMetaData();
 
  // create FFT object  
  fft = new FFT(audio.bufferSize(), audio.sampleRate());
 
  // Sets the window to use on the samples  
  // before taking the forward transform.
  fft.window(FFT.HAMMING);
   
  // calculate averages based on a miminum octave width of minHz
  // split each octave into bands
  fft.logAverages(minHz, bands);
   
  // setup arrSpectrum to contain each avg value from spectrum
  arrSpectrum = new float[fft.avgSize()];
  rectMode(CORNERS);
 
  // Create a new output file in the sketch directory
  output = createWriter(meta.fileName()+".txt");  
 
  // get track dur from metadata associated with file	
  durSecs = meta.length()/1000;
  durMilli = meta.length()-(meta.length()/1000)*1000;
 
  // log the amount of time setup took to complete  
  println("Setup Complete: "+millis());	
  println("File Name: "+meta.fileName()+" Length: "+durSecs+":"+durMilli);
  setupMilli = millis();	    
  // play the file
  audio.play();  
}


// DRAW //
void draw() {
  background(0);
  fill(255);
   
  // reset text pos to yi
  int y = yi;
 
  // perform a forward FFT on the samples in audio's mix buffer
  fft.forward(audio.mix);
  fft.calcAvg(minHz, maxHz);
  int w = int(width/fft.avgSize());
  
 
  // draw a rectangle for each average
  for(int i = 0; i < fft.avgSize(); i++) {
    rect(i*w, height, i*w + w, height - fft.getAvg(i)*5);
    
    // update the average value in array
    arrSpectrum[i] = round(fft.getAvg(i));
  }
 
  // update playMilli to current play duration minus setupMilli
  playMilli = millis() - setupMilli;
  playSecs = (playMilli/1000);
   
  // remap milliseconds for current second between 1 + tFrameRate
  curFrame = round(map((playMilli-(1000*playSecs)),0,1000,1,tFrameRate));
   
  // if curFrame is not the same as last logged frame call logger
  // ensure we log only based on the framerate
  if(curFrame != logFrame) { logSpectrum(); }
 
  // update text output so user can sse what is happening
  text("File Name: " + meta.fileName(), 5, y);
  text("Length: " + durSecs+":"+durMilli, 5, y+=yi);  
  text("Timer: " + (playSecs)+":"+curFrame, 5, y+=yi);
   
  // exit case if we are beyond the duration of the track
  if (millis()/1000 > meta.length()/1000) {  
    end();  
  }  
}
 
 
// STOP // final output to console and EXIT ...
void stop() {
  println("DONE: "+(playMilli/1000)+":"+curFrame);
  exit();
}
 
 
/////////////////////////////////////////////////////////////////////  
// FUNCTIONS //
 
// LOGSPECTRUM // logs spectrum data at current frame to audioLog
void logSpectrum() {
  logFrame = curFrame;
  String joinedSpectrum = join(nf(arrSpectrum, 3, 0), ",")+"\n";  
  audioLog +=  nf(playSecs,2)+":"+nf(curFrame,2)+"\t"+joinedSpectrum;
  //println(audioLog);
}
 
 
// END // cleans up minim classes write log file + forces close
void end() {
  // always close Minim audio classes when you finish with them
  audio.close();
  minim.stop();
 
  println(audioLog);
  
  //String logLines new String[lo
  saveStrings(meta.fileName()+".txt", split(audioLog,"\n"));
 
  super.stop();  
}

« Last Edit: Oct 22^nd, 2008, 3:36pm by glasshopper+ »

glasshopper+

Verbosity
Reply #12 - Oct 23^rd, 2008, 4:17pm

I am realising that my code above is a little bit verbose and am in the process of wrapping the logger into a class for logging and a class for analysing the log. I also realised I have re-invented the wheel in some places and have some bugs in the code listing above. Where is the best place to post code for the community???

Once tested as a movie this technique works adequately - the next trick is to work out which visual is most elegantly animated by which audio peak.

davbol

Re: Minim audio video sync
Reply #13 - Oct 23^rd, 2008, 7:06pm

I've done a fair amount of messing around with offline rendering and sync issues. I've been meaning to post back here in the forum (since i don't blog) in case anyone else might benefit, so here are some further thoughts, fwiw...

There are 3 (4?) general approaches to capture sync'd audio/video:

1A) do everything real-time, play audio, draw stuff, save video frames to disk, merge audio after-the-fact
1B) do everything real-time, play audio, draw stuff, record both simultaneously with a camera (or external software)
2) do the audio analysis real-time, log the results, but render video offline by reading the log, merge audio after-the-fact
3) do everything offline, fft the audio one video frame at a time, render that frame & save it, merge audio after-the-fact

Method 1A is practically guaranteed to be laggy and loose audio sync, due to the inexact video framerate of Processing. (and, of course, if you have really complex visuals that further drop the requested video framerate, then it's just that much worse)

Method 1B suffers the same video lag as 1A, but due to the realtime recording will make up for it when played back. (camera will simply record multiple frames of a single rendered frame as necessary until you get around to rendering the next one) So that even if you can't render *every* video frame at least the frames you do render are recorded synchronously with the audio currently playing. (this is essentially just recording a live performance, a non-issue as far as i'm concerned, it works, done)

Method 2 will likely have minor lags, again due to the inexact video framerate of Processing (versus the audio samplerate which is *exact*). So on video frame N you can't be sure that audio is exactly playing at N*samplerate/framerate, so your logged results will have lag built into them.

Method 3 has no lag and can attain "perfect" sync.

Consider if you had an *exact* 30fps video rate: playing a 44.1Hz audio requires that 1470 (44100/30) samples are analyzed during each video frame. On frame #1, you cue to sample #0 and fft the next 1470 samples. On frame #2, you cue to sample #1470 and fft the next 1470. On frame #3, you cue to sample #2940 and fft the next 1470. et cetera till end of audio.

(technical aside: in order to accomplish an fft of 1470 samples, you'd need to zero pad a buffer of size 2048)

so, if you had audio exactly 60 seconds long, you'd do exactly 1800 fft's (60*30) spanning exactly 79380000 samples (1800*44100) and render exactly 1800 video frames, each of which "looked at" an fft of exactly 1470 samples. That's the sort of math you'd like to see for "perfect" sync.

btw, that's a good test of any rendering method - create an audio snippet of exactly 60 seconds, process it, then ask: did i render exactly 1800 video frames? or log exactly 1800 fft frames? if not, then you'll loose sync to some degree when you eventually merge the audio with video.

Basically, if you do your FFT based on audio that's *playing* realtime (even if logging it for later rendering), there's really no way using the draw() thread to capture perfectly sync'd audio/video (other than some variant of method 1B with an external live recorder). A better approach for method 2 "logging" variants would be to start a separate thread, and so doing the fft asynchronously with the draw() thread. That should get you a lot closer, use the sleep(long,int) version to get 1000/30=33 millis + 333333 nanos. (and i'd still check it, did it log exactly 1800 fft frames on your 60 second test audio?)

Method 3 gets around all of that by never actually playing the audio, just fft-ing the raw wave data at specific cue points that match up exactly with video frames. And it's really, really easy once you "get" how it works. I've done pieces >15 minutes that retained perfect frame-accurate sync throughout - hard to do with other methods! (or, at least, it was hard for me!! :-D) Hope that helps.

glasshopper+ YaBB Newbies Offline Posts: 7	Re: Minim audio video sync Reply #14 - Nov 2^nd, 2008, 10:11pm This is really useful - tx ... as I understand it though - Method 3 is a perfect situation - but is not possible with Minim because it cannot seek to a cue point. Did you succeed with Method 3 using ESS and Sonia? If so are you only analysing mono files???

Pages: 1

‹ Previous Topic | Next Topic ›