Processing 1.0 - Processing Discourse

st33d

Beat Detection
Mar 9^th, 2007, 6:15pm

This thread is for aggregating beat detection information.

We'll start with Robert's new stuff:

Magnetoshperes
Beat detection revisited
Trentemøller and me
1969
Hard shell
Royksopp PLUS Trentemoller
Royksopp PLUS Trentemoller :: Remix

And...

Beatdetection Algorithm on Gamedev.net

Beat synchronisation project

Processing Beat Detection Object (own up)

OGG format video on Beat Detection

st33d

Re: Beat Detection
Reply #1 - Mar 9^th, 2007, 10:17pm

It was ddf. Sorry I missed all that. I've been avoiding sound analysis for ages. I'm interested in combining it with some heuristic animation.

Processing Alpha threads:

http://processing.org/discourse/yabb/YaBB.cgi?num=1165273417#2

Edit:

More -->

A Java Beat Detector

musicdsp.org sound analysis page

sound analysis in Flash

Helmholtz Harmony class (Java)

Sonic Art Websites

Sound Toys

Cybersonica

Java Music Projects

« Last Edit: Mar 10^th, 2007, 5:54pm by st33d »

ddf Full Member Offline Posts: 241	Re: Beat Detection Reply #2 - Mar 12^th, 2007, 10:53pm Wow, that's a pretty serious roundup. It would have been nice to have that when I was writing the beat detection class. FWIW, I used the gamedev article as my starting point. Probably I should check out that other stuff, too.
	http://www.compartmental.net

davbol

Re: Beat Detection
Reply #3 - Mar 13^th, 2007, 5:52pm

Can I chime in here with a somewhat related issue? Timing and synchronization.

I'd appreciate comments from anyone with experience dealing with audio/video synchronization issues, which are particularly evident when doing beat-detection stuff.

In particular, what seems problematic is the capture and assembly of synchronized a/v into a movie (pick your format). Contrast with running "real-time", where a little slop in the video frame rate is made up for by the fact that the next frame (whenever it occurs exactly) will be processing the current audio stream, so it self-corrects any timing irregularities.

The problem is when trying to capture both streams and sync them up later, at which point it won't sync exactly. frameRate(30) just isn't accurate enough, and forget about frameRate(29.97). Even an accumulated error of just one frame per 10 seconds quickly adds up to a significant problem.

Granted, you could just hook up a camcorder to the output of realtime and grab both, but it seems a waste to go through that lossy conversion stage. Given a digital audio source, and digital video source, how best to export them directly *and* in such a way that they'll sync perfectly?

Related topics might include accessing the vertical retrace for a fixed sync via opengl. (iow, if you could write out frames at *exactly* 30 fps, via MovieMaker or discrete TIFs or whatever, then you could theoretically later assemble those and expect them to sync with audio when played back at exactly 30 fps). Or any hacks into JMF/QT4J/etc for directly writing the interleaved streams out at runtime, and how you'd go about "wiring" an ESS/Sonia stream and your PImage's into them.

I'm nearly resolved to looking into the QT4J SDK to see if I can figure out an "easy" way to hack an audio stream into Shiffman's MovieMaker library. Ugh.

And of course super-duper bonus points to anyone who points out that I'm an idiot for overlooking an existing solution that already works perfectly! (which is as likely as not)

JohnG Ex Member	Re: Beat Detection Reply #4 - Mar 13^th, 2007, 7:02pm If you're using OpenGL and windows, a program like FRAPS (http://www.fraps.com/) will be able to record the output of a sketch, including audio (unless you have a wierd sound setup) The demo version is limited to 30 seconds and leaves a watermark, but will let you try it to see if it would work for your needs.

davbol Senior Member Offline Posts: 263	Re: Beat Detection Reply #5 - Mar 13^th, 2007, 10:33pm John: excellent, thanks! (bonus points awarded) Too bad it won't capture ANY window, but it sure does a good job with OpenGL.

st33d

Re: Beat Detection
Reply #6 - Mar 17^th, 2007, 7:26pm

I've looked at the traditional BeatDetection class and it differs a lot in comparision to the blog entry flight404 has up.

Instead of using a comparison of the last 1024 samples he has a dying beat threshold.

I put together an example from flight404's write up of the method and it seems a little more true than the last 1024 samples method. The problem with 1024 is that it's really prone to being set off by anything, whereas threshold death lands smack on a beat about %80 of the time (I was testing it with some Amon Tobin - some pretty messy beats).

I'm going to code up a face off of 1024 samples vs threshold death to see which is the most reliable method.

I'd really like to get a tempo measurement but it's looking unlikely, so my plans for a heuristic animation are gonna have to adapt.

Here's some threshold death code for you folks to improve upon and try out (note the total lack of thought in positioning the zones):

Applet here

(I had to go through the grief of signing it - this link here is for my use so I can get back to my tutorial to myself on how to do that)

Please post any improvements / suggestions.

ddf

Re: Beat Detection
Reply #7 - Mar 17^th, 2007, 10:01pm

The thing with that method is that I'm pretty sure the FFT class in Ess already has that sort of thing built in with the maxSpectrum and maxAverages arrays. It also lets you set how quickly the values decay. When you get right down to it, the decaying max average is essentially the same technique that I use in my BeatDetection class, but I'm not convinced that it is the better technique.

http://www.compartmental.net

st33d

Re: Beat Detection
Reply #8 - Mar 18^th, 2007, 3:29pm

My whole issue with just splitting the FFT into three neat bands is that I really don't think the sound works like that.

I'd like the zones to be completely user defined, meaning they can overlap, be too long or too short. I'm trying to figure out how to get your BeatDetection class to do that so I can do a proper comparison.

I've looked at isRange() but I don't know where I'm getting the threshold value from. I thought the threshold was supposed to be the average of the last 1024 samples.

ddf

Re: Beat Detection
Reply #9 - Mar 18^th, 2007, 5:02pm

No, it's more complicated than that. The BeatDetection class works by tracking the values in the averages array of the internal FFT. You can set how many averages it uses by calling detectDetail(). It's got a one second memory for each average band, so to speak. So each time a new value is received it compares it to a sort-of filtered average of the previous one second of values and if the new value is higher than that number it registers a beat in that average band. But it isn't a hard-and-fast threshold. The threshold used adjusts itself based on whether the signal being received is very loud or very soft. So a loud portion of a song won't muck up the detection abilities during a later soft portion.

There is a boolean array cooresponding to the average array of the FFT. The algorithm flips the values in this array to true or false depending on whether it detects an onset in the corresponding average band. isRange(), then, lets you group together boolean values and set your own "threshold" for calling it a beat. So when you say isRange(5, 20, 10), you are saying, "Look at the boolean values in the onset array in the range of 5 to 20. If at least 10 of those are true, then return true." So, it's not a way to change the way the algorithm works, but merely a way to interpret the raw data.

The function isSnare() uses isRange() in this way and I decided on the values by staring at a visualization of the entire onset array while listening to music with a strong backbeat. You can kind of see how the things you want to detect are registering and it's never as cut and dried as "band number 6 cooresponds to the snare drum".

Does that answer your question?

http://www.compartmental.net

st33d

Re: Beat Detection
Reply #10 - Mar 18^th, 2007, 6:17pm

My problem is that I don't want Ess doing the averages. I want to be able to say, "from bin 34 to bin 100". If I just want 3 zones then Ess is already performing 61 unnecessary calculations if I use isRange(). Twice over in fact because Ess calculates all those averages, then those averages are buffered, then a group test has to be performed.

Ess doesn't already do what I'm doing with some zones and thresholds because it offers no fine control over averaging the bins.

I'm pretty much at the stage where I'm going to have to write BeatDetection from scratch if I want it to examine only the bins I want it to.

ddf Full Member Offline Posts: 241	Re: Beat Detection Reply #11 - Mar 18^th, 2007, 10:10pm Ah ha, yes, gotcha. If you want that kind of control over the average bins themselves, then, yes, you'd have to compute the averages yourself. However, once you did that, it wouldn't be a very big modification to the BeatDetect class to use your averages instead of the FFTs.
	http://www.compartmental.net

st33d

Re: Beat Detection
Reply #12 - Mar 19^th, 2007, 3:42pm

One of the big issues I have with the whole energy buffer idea is that it makes it sound like you have to punch a whole array of data back and forth with arraycopies.

To remove all of array stuff you could simply track a playhead moving over the buffer erasing the oldest averages. No array operations necessary.

So my trouble at the moment is that I have an energy average buffer now - it seems to work a lot better than thresholds (responds nicely to Jackson and his Computer Band), but I haven't done it the GameDev fashion, so I'm still registering beats when it's quiet. (The fact is, I don't understand the equations, neither mine nor ddf's code is executing 44100 times a second, I compromised and compared one frame to an average of 40). Here's my code. I'm gonna take a brain break and then see if I can pull the essentials out of ddf's code.

Code:


//import processing.opengl.*;
import krister.Ess.*;
FFT fft;
AudioInput in;
int bufferSize;
BeatZone2 bd;
float [] bdFill;
void setup(){
  size(532, 230);//,OPENGL);
  Ess.start(this); 
  bufferSize = 1024;
  in = new AudioInput(bufferSize);
  fft = new FFT(bufferSize);
  fft.equalizer(true);
  // set up our FFT normalization/dampening
  float minLimit=.005;
  float maxLimit=.05;
  float myDamp=.1f;
  int numAverages=32;
  fft.limits(minLimit,maxLimit);
  fft.damp(myDamp);
  fft.averages(numAverages);
  bdFill = new float[]{
    255.0, 255.0, 255.0    };
  // set BeatDetect
  bd = new BeatZone2(fft, in, 3, in.size, in.sampleRate);
  bd.setZone(0, 0, 100);
  bd.setZone(1, 100, 400);
  bd.setZone(2, 400, 511);
  in.start();
}
// Maximum fft 1.0
// Minimum fft 0.0
// After init to Ess FFT demo that is
void draw(){
  background(242, 240, 174);
  // Beat Detect
  noStroke();
  fill(131,104,81);
  for(int i=0; i<bufferSize/2; i++) {
    rect(10+i,10,1,fft.spectrum[i]*200);
  }
  stroke(0);
  for(int i = 0; i < bd.zoneE.length; i++){
    bdFill[i] = bd.beat[i] ? 255.0 : bdFill[i] > 0.0 ? bdFill[i] - 5 : 0.0;
    fill(color(250, 186, 10, bdFill[i]));
    rect(10+bd.zoneStart[i], 10, bd.zoneEnd[i] - bd.zoneStart[i], 200);
    line(10+bd.zoneStart[i], bd.zoneE[i]*200, 10 + bd.zoneEnd[i], bd.zoneE[i]*200);
  }
}
public void audioInputData(AudioInput theInput) {
  fft.getSpectrum(in);
  bd.read();
}
public void stop() {
  Ess.stop();
  super.stop();
}
class BeatZone2{
  FFT fft;
  AudioInput in;
  float [] zoneE;
  int [] zoneStart;
  int [] zoneEnd;
  float [][] zoneBuffer;
  boolean [] beat;
  float decay = 0.009;
  int playhead = 0;
  int bufferLength = 40;
  BeatZone2(FFT fft, AudioInput in, int numZone, int bins, float rate){
    this.fft = fft;
    this.in = in;
    zoneE = new float[numZone];
    zoneStart = new int[numZone];
    zoneEnd = new int[numZone];
    beat = new boolean[numZone];
    zoneBuffer = new float[numZone][];
  }
  void read(){
    for(int i = 0; i < zoneE.length; i++){
	beat[i] = false;
	zoneE[i] = getZoneE(i);
	if(zoneE[i] > getBufferAve(i)){
	  beat[i] = true;
	}
	loadBuffer(i, zoneE[i]);
    }
    playhead = (playhead + 1) % bufferLength;
  }
  void setZone(int num, int start, int end){
    zoneStart[num] = start;
    zoneEnd[num] = end;
    zoneBuffer[num] = new float[bufferLength];
  }
  float getZoneE(int num){
    float e = 0.0;
    for(int i = zoneStart[num]; i < zoneEnd[num]; i++){
	e += fft.spectrum[i];
    }
    return e / (zoneEnd[num] - zoneStart[num]);
  }
  void loadBuffer(int num, float val){
    zoneBuffer[num][playhead] = val;
  }
  float getBufferAve(int num){
    float a = 0.0;
    for(int i = 0; i < zoneBuffer[num].length; i++){
	a += zoneBuffer[num][i];
    }
    return a / zoneBuffer[num].length;
  }
}

davbol

Re: Beat Detection
Reply #13 - Mar 19^th, 2007, 6:08pm

st33d wrote on Mar 18^th, 2007, 6:17pm:

...I don't want Ess doing the averages... I want "from bin 34 to bin 100"...

Yep, the default linear buffers aren't very useful, and I've just been doing the same sort of thing myself, feel free to borrow ideas from:
http://www.davebollinger.com/works/p5/fftoctana/

In particular, if you wanted to hack it for your 3-band use, look into how the spe2avg map is built and rebuild it for your puposes.

ddf

Re: Beat Detection
Reply #14 - Mar 19^th, 2007, 8:03pm

st33d: I'm not entirely satisfied with the way I am managing the array buffers either, though I don't quite understand what you are suggesting as an alternative. The essentials of my algo is the sEnergy() function. The only difference between that and the fEnergy() function is that fEnergy() performs the algorithm on each average band individually. sEnergy() is pretty heavily commented, so you should be able to suss it out.

http://www.compartmental.net