cheesecake, you will find that using loadPixels/updatePixels and directly accessing the pixels[] arrays is a good deal quicker than using get() like this.
Assuming theImage and "imageCopy" have the same size, once you've done "imageCopy.loadPixels()" then for any given (i,j) pair, you can get (imageCopy.pixels[i+j*imageCopy.width]) -- neighbor addresses calculated the same way.
But look again at the example I gave above, it lets the hardware do the job for you via tint() blending (we just add nine copies, with the alpha set to 1/9th). This approach refactors the problem: instead of ((c1+c2+c3...+c9)/9.0), with lots of brute-force steps at every pixel to identify those various "c's," we calculate (c1/9.0 + c2/9.0 + ...+c9/9.0) for all pixels at once, which gives the same result.
The code above does it all in nine passes + a call to tint(). For even a small 100x100 image, the brute-force way is (9+1)*100*100 address lookups (the +1 is for the destination) -- roughly a 10,000:1 difference :)
I've drawn to the screen in the sample, but you could always target another texture via PGraphics.
kb,
http://www.riftgame.com/