Loading...
Logo
Processing Forum
I’ve been working on a library that makes it easy to use microphone input and transcribe it to text. It uses the technology that Google Chrome uses for its input fields where you can optionally use your voice instead of your keyboard. You can read more about it on the project page

Please let me know if you have suggestions on how I could improve the design of the library or anything else!


Replies(19)

I just wanted to announce some improvements to the stability of the library. Please report any bugs that you come across! Also: I’m interested in projects that make use of my library. 
I've been experimenting with this and hopefully I'll have some interesting results to share soon.

Thanks for the library, by the way. Nice documentation!
Hey.

The Graffiti Research Lab France is using your library in its tagEULE projet:
We use speech recognition to display graffitis using GML files based fonts along with the GML4U library.

It's been really useful and saved us some dev time.

However, we seem to run into problems when using our installation in a quite noisy environment (outside or in a crowd) and with "not as good as expected" wifi network covergae.
I tried different thinks as :
Case 1- Set the threshold manually
Case 2- Disable autorecord

Under those conditions my first feeling is that 
Case 1 : the recording constantly starts/stops and we either get partial results or queue too many recordings.
Case 2 : don't work very well (probably because of the noise around).

We also noticed that deep voices works are better recognized than high-pitched ones (in French).
Dunno if there is a way to filter voices to fit in the right spectrum of voices that Google API can handle.

Is that something you are aware of or have workarounds?

I gonna try to debug under Eclipse to have a better understanding on how it works exactly and try to spot where the issues we have come from.

Will let you know if I find anything useful to improve the lib.

Thanks for lib and keep on the good work.

++

J

Actually the service from Google is pretty limited regarding the length of the recording. 

I’ve always tested under good conditions with only few background noise. If the recording itself is too bad (you could test it directly in chrome) you may need to work on the hardware. On the other hand I handle the auto-record really simple. If something is louder than the threshold value, I record until the volume falls below for at least half a second. 

I’m sure that one could improve the auto record and better distinguish between noise and actual speech if you would constantly average the volume or the like.

Actually I have no resources to work on the library but there are a few other things that I need to work on (e.g. using the library to transcribe pre-recorded files).

I’ll keep you updated!
This is one of the coolest Processing libraries I've seen! Transcribing is so easy!
The results were decent enough for my use.

Great job! Thanks for sharing.

it says Speech could not be interpreted..any ideas why, I m using the example from fForian's site
link to the updated library? or did you just replace your older one? also great work btw will let you know when and where i might be using it. also what do you think about the new dictation on mL? do you have a list of languages this supports?
The most recent library is available at www.stt.getflourish.com
I haven't looked into Mountain Lion's dictation yet, but hopefully we can utilize that to get even better results!

Supported languages for STT are all major ones that Google supports. 

en, de, fr, es, and even Chinese which is zh.
Hi guys,

I am trying to use this amazing library but I keep getting an Minim error. I have tried to fix it but no luck so far... Did anybody else had the same problem?

So the error I am getting is this:

13:17:59 STT info: Manual mode enabled. Use begin() / end() to manage recording.

==== JavaSound Minim Error ====

==== AudioRecorder.save: Error attempting to save buffer to /Users/nikolaoschandolias/Documents/workspace/STT/bin/data2013-01-27-13-17-59/0.wav, the output file is empty.


==== JavaSound Minim Error ====

==== Unsupported Audio File: not a MPEG stream:null


Exception in thread "Animation Thread" java.lang.NullPointerException

at ddf.minim.javasound.JSBufferedSampleRecorder.save(JSBufferedSampleRecorder.java:173)

at ddf.minim.AudioRecorder.save(AudioRecorder.java:107)

at com.getflourish.stt.STT.startListening(STT.java:422)

at com.getflourish.stt.STT.onBegin(STT.java:367)

at com.getflourish.stt.STT.begin(STT.java:133)

at com.getflourish.stt.LibTest.keyPressed(LibTest.java:30)

at processing.core.PApplet.handleKeyEvent(PApplet.java:2931)

at processing.core.PApplet.dequeueEvents(PApplet.java:2466)

at processing.core.PApplet.handleDraw(PApplet.java:2153)

at processing.core.PGraphicsJava2D.requestDraw(PGraphicsJava2D.java:193)

at processing.core.PApplet.run(PApplet.java:2020)

at java.lang.Thread.run(Thread.java:680)


I am using processing 2.07 (I have tried it with 1.51 too) and I am on Mountain Lion... do you have any idea or possible solution?

Regards,
Nikos
Hey Nikos, it's working on my machine with the same version of Processing and OSX. Are you using it from Eclipse?
Hey Florian,

Thanks for the answer! I have found the solution, there was a problem with the wav file encoding 
It works perfectly now!

I run it directly at processing but I might transfer the whole project on eclipse. 

I will keep you post it for further new on my project! 

Thanks once again!
Awesome! I'm glad to hear that it works now. Good luck with your project and let me know when you got anything to try or show :)
Hi Florian, I have been trying to use the library I tried the example on the project page but I keep getting an error message saying class STT not found. 

Exception in thread "Animation Thread" java.lang.NoClassDefFoundError: ddf/minim/Recordable
at sketch_130204b.setup(sketch_130204b.java:33)
at processing.core.PApplet.handleDraw(PApplet.java:2103)
at processing.core.PGraphicsJava2D.requestDraw(PGraphicsJava2D.java:190)
at processing.core.PApplet.run(PApplet.java:2006)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: ddf.minim.Recordable
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 5 more

Thanks,
Hi again everybody :)

I am interested to know if someone have attempted to implement this library in Java instead of Processing... Does anybody have any idea on where I should start...

Florian, have you done this first in Java and then transform it into a Processing lib?

Could you please let me know?
Thanks in advance!

Yes, I've built the library with Java in Eclipse and only the main class depends on PApplet to fire the transcription events. You should be able to extract that and get rid of the Processing part if you need to. 

And here is the full repository:  https://github.com/getflourish/STT
 
Thanks for the library, but when it works more than 20 minutes in autorecord mode OutOfMemoryError occurs (  increasing  the amount of available memory does not help)
This is a super awesome library!
It finally brings me near to an easy voice control of arduinos controlled over the processing sketch.
I wonder if I can use that for controlling an indoor airship, should be reasonable fast.
Thanks for your feedback! As others already mentioned, you should check out how the library performs after time. This is an ongoing issue that I haven't been able to fix yet. So better test before your airship drops ;)