FFT to identify bird's singing

edited March 2018 in Android Mode

Hello, i'm sorry if i'm not posting in the right Category, i'm new to the forum.

I'm working on a project for hiking help for android, and i thought about a functionnality that can be fun, recognizing a bird's singing. The idea is to record a bird that you pass by and the database tells you which species it is.

I'm not a pro of sound processing, but i though that implementing an FFT could help. I was planing on using the FFT to get the max-min amplitude and compare it with the database's pre-processed information, of course i don't plan on using only the min-max indicators.

I inspired my code from this: https://github.com/blanche/shayam/blob/master/java/at.lw.shayam/src/at/lw/shayam/AudioAnalysis.java and as much as i undestand the maths behind the fourrier transform, i don't get everything in that code.

So here are my questions:

  1. The chunks are used to accelerate the computing time of the FFT ? if we have a 2^n chunks, we'll have 2^n smaller FT processed ?

  2. The results[][] 2d Complex Array contains... complexs. But i don't understant what is x and y in results[x][y], how can you find the frequency and the amplitude. (of course i'll have to convert the complexs to doubles)

  3. Do you think this approach is enough ? the projet is not professional so i'm not trying to get a recogntion rate of 100% !

Thank you for your answers.



  • edited March 2018

    Just a warning here -- using ~3-10 features as direct values from ambient outdoor audio like that, it might be hard to get to even a 5-10% recognition rate.

    My guess would be that in addition to a big database to match against you need a large feature set and a machine learning algorithm. You also might need some processes to deal with ambient noise, and/or smart ways of figuring out which window of time to examine in a running sound stream or buffer (birds won't sing on-queue for you). Another way of making recognition more likely would be to check GPS information and prioritize guesses based on common nearby birds.

    For a recent discussion of commercial apps that try these things -- and still don't really get above 60% on common examples, even when feeding them high-quality professional recordings -- see:


    I'm not saying "don't even try" -- just warning you that it may be quite a bit of work, and if you are correct less than 1 time out of 20 it may be fun to make but not fun to use.

  • edited March 2018


    i do agree with @jeremydouglass: that is not at all a simple project, neither for android, and neither for java. Why? - because the right way is to use tensorflow. and if you can import models pre-trained into AS (as for P5 i dont know) there is no way to create and train a model with Java: you need Python to do that. As for me i have successfully tried with android tensorflow, but now i am stuck with the pre-trained models. So, i have tried with Python. It works...But for birds you need a) to have a database for birds songs - which can be huge! then b) to construct your model and train it, which is not only FFT! (because sounds models are constructed like images and so it is quite unpredictable that the time-units when a bird repeats its song is the good one...) - So, though i am very interested by your project i think that it is a real challenge...and not for P5!

  • In the meantime, you are very welcome to start building your database of sounds. Even a better project at this stage, it would be to allow people from other regions/locations to upload their data. You need a website, some content management and a good "sell speech" to get people involved and volunteer their efforts to contribute to your database by donating recorded sounds with bird information.

    Implementing the fft by itself could provide to be a fun and simple task (cough cough) but the matching will be a daunting one (no experience here). If you ahve a sample code and few sample files, you can posted here and I can have a look. Just don't forget to tag me with @kfrajer.


  • Thank you all for your answers. So there's this website called xeno-canto who is a database for sounds(birds) the biggest one i found and the community if very big so i'm using that as my db (i already asked the website's owner and they even have an API to help downloading sounds), oh and most of the records are without noise.

    I know that the recognition won't be perfect and i'm ok with that. As for the bird's singing recording in real life, it'll depend on the user, most of birds sing in cycles, for example i recorded a bird yesterday, he was singing 5 seconds, then he stopped for 5seconds etc and did that for a min and the took a break and went again, the only problem is the noise, but i think i can work on that.

    I thought about Tensorflow but it's a pain to implement on java, but maybe on a later version.

    I started coding the FFT, i'm using a library in Java so it's not that hard, i just need to understand the output, as my mp3 array is 125000 bytes long and so is the output array.

    I'll post some code later.

  • thought about Tensorflow but it's a pain to implement on java, but maybe on a later version.

    Possibly relevant:

  • @jeremydouglass=== yes, and i have got it to work; but the real problem is for the models: i dont think that there is a model for birds singing and to create a model is a very long and difficult work that you have to do with python.

Sign In or Register to comment.