davbol
Re: Minim audio video sync
Reply #13 - Oct 23rd , 2008, 7:06pm
I've done a fair amount of messing around with offline rendering and sync issues. I've been meaning to post back here in the forum (since i don't blog) in case anyone else might benefit, so here are some further thoughts, fwiw... There are 3 (4?) general approaches to capture sync'd audio/video: 1A) do everything real-time, play audio, draw stuff, save video frames to disk, merge audio after-the-fact 1B) do everything real-time, play audio, draw stuff, record both simultaneously with a camera (or external software) 2) do the audio analysis real-time, log the results, but render video offline by reading the log, merge audio after-the-fact 3) do everything offline, fft the audio one video frame at a time, render that frame & save it, merge audio after-the-fact Method 1A is practically guaranteed to be laggy and loose audio sync, due to the inexact video framerate of Processing. (and, of course, if you have really complex visuals that further drop the requested video framerate, then it's just that much worse) Method 1B suffers the same video lag as 1A, but due to the realtime recording will make up for it when played back. (camera will simply record multiple frames of a single rendered frame as necessary until you get around to rendering the next one) So that even if you can't render *every* video frame at least the frames you do render are recorded synchronously with the audio currently playing. (this is essentially just recording a live performance, a non-issue as far as i'm concerned, it works, done) Method 2 will likely have minor lags, again due to the inexact video framerate of Processing (versus the audio samplerate which is *exact*). So on video frame N you can't be sure that audio is exactly playing at N*samplerate/framerate, so your logged results will have lag built into them. Method 3 has no lag and can attain "perfect" sync. Consider if you had an *exact* 30fps video rate: playing a 44.1Hz audio requires that 1470 (44100/30) samples are analyzed during each video frame. On frame #1, you cue to sample #0 and fft the next 1470 samples. On frame #2, you cue to sample #1470 and fft the next 1470. On frame #3, you cue to sample #2940 and fft the next 1470. et cetera till end of audio. (technical aside: in order to accomplish an fft of 1470 samples, you'd need to zero pad a buffer of size 2048) so, if you had audio exactly 60 seconds long, you'd do exactly 1800 fft's (60*30) spanning exactly 79380000 samples (1800*44100) and render exactly 1800 video frames, each of which "looked at" an fft of exactly 1470 samples. That's the sort of math you'd like to see for "perfect" sync. btw, that's a good test of any rendering method - create an audio snippet of exactly 60 seconds, process it, then ask: did i render exactly 1800 video frames? or log exactly 1800 fft frames? if not, then you'll loose sync to some degree when you eventually merge the audio with video. Basically, if you do your FFT based on audio that's *playing* realtime (even if logging it for later rendering), there's really no way using the draw() thread to capture perfectly sync'd audio/video (other than some variant of method 1B with an external live recorder). A better approach for method 2 "logging" variants would be to start a separate thread, and so doing the fft asynchronously with the draw() thread. That should get you a lot closer, use the sleep(long,int) version to get 1000/30=33 millis + 333333 nanos. (and i'd still check it, did it log exactly 1800 fft frames on your 60 second test audio?) Method 3 gets around all of that by never actually playing the audio, just fft-ing the raw wave data at specific cue points that match up exactly with video frames. And it's really, really easy once you "get" how it works. I've done pieces >15 minutes that retained perfect frame-accurate sync throughout - hard to do with other methods! (or, at least, it was hard for me!! :-D) Hope that helps.