I know I have seen a few sketches that do something similar to this but I don't have links presently. In general, you might want to do a google search for sonification (or auralization) of images.
You could certainly read an image pixel by pixel and make a mapping to pitch (for example). I don't know that there are any libraries that specify mappings but that is the fun part anyway and something you probably want creative control over.
Essentially you want to come up with some sort of relationship between a particular property of an image (or a pixel) and a particular property of sound. I have some experience with this but not with graphics. I have a program that takes physiological data and creates music in realtime. My mapping is fairly straight forward in that an increase in "excitement" causes the pitch to go up. In the interest of aesthetics, I have many overlapping ranges such that I get chords and polyphony. Couple that with some high quality VST synths and you can make beautiful music.
One idea to consider....
Rather than play the sound in Processing...try sending the MIDI data to DAW (Protools, Sonar, Cubase, Logic, Abelton, whatever) and let the DAW handle the music. Depending on the purpose of your application, this may get you better results.
Cheers,
Darin