Fast blurring
in
Share your Work
•
1 year ago
Hey all,
TL;DR:
I've been working on faster variants of filter(BLUR), filter(DILATE) and filter(ERODE). Also plays nice with code that tends to play around with the pixel array a lot. Up to fourteen times faster than normal blur! About two times faster than Mario Klingemann's Super Fast Blur, but note that this is only because I specifically optimised for a 3x3 kernel. His version is faster for bigger kernels. See code in action here, with source:
Added downside/upside: wraps around (suited my needs)
I've been trying out some glow/dissolve effects in my code. Combining filter(BLUR), filter(DILATE) and filter(ERODE) gives results tha:t look nice, but the problem is that filter(WHATEVER) is very slow, especially BLUR.
Note: I'm mostly coding on a netbook, without OpenGL. Otherwise I'd use the GLGraphics library and apply shaders. Also, I tend to use per-pixel manipulation a lot in my code, and we all know loadPixels()/updatePixels() and OpenGL don't really like one another (again, can be fixed with textures and shaders, but not on my netbook).
Also it felt like a complete waste of time to use
- updatePixels();
- filter(SOMETHING);
- loadPixels();
... every time I wanted to apply a filter to the pixels I was messing around with.
So, I've tried to build faster filters that manipulate the pixel array, then leave it alone. Actually ended up with making a second buffer, but hey, it suits my needs and doesn't make the code that much complexer in use.
I made these assumptions:
- multiplication is faster than division
- shifting is faster than either of those
- reads are faster than writes
- numeral constants are faster than variables
- bitwise AND is probably quite cheap as well
- las but not least, the JVM is smart enough to optimise something as crazy as:
- for (int j = 1; j < (height-1); ++j){
- t[i + j*width] = (((((s[i + j*width] & 0xFF) << 2 ) +
- (s[i + 1 + j*width] & 0xFF) +
- (s[i - 1 + j*width] & 0xFF) +
- (s[i + (j+1)*width] & 0xFF) +
- (s[i + (j-1)*width] & 0xFF)) >> 3) & 0xFF) +
- (((((s[i + j*width] & 0xFF00) << 2 ) +
- (s[i + 1 + j*width] & 0xFF00) +
- (s[i - 1 + j*width] & 0xFF00) +
- (s[i + (j+1)*width] & 0xFF00) +
- (s[i + (j-1)*width] & 0xFF00)) >> 3) & 0xFF00) +
- (((((s[i + j*width] & 0xFF0000) << 2 ) +
- (s[i + 1 + j*width] & 0xFF0000) +
- (s[i - 1 + j*width] & 0xFF0000) +
- (s[i + (j+1)*width] & 0xFF0000) +
- (s[i + (j-1)*width] & 0xFF0000)) >> 3) & 0xFF0000) +
- 0xFF000000;
- }
- }
Here's the results of my own sloppy, not that trustworthy benchmark:
filter(BLUR) 532467 - 26931 = 505536 factor: 1.00
divBlur 130583 - 26931 = 103652 factor: 4.87
shiftBlur 84658 - 26931 = 57727 factor: 8.75
shiftBlur3 82710 - 26931 = 55779 factor: 9.06
shiftBlur2 76136 - 26931 = 49205 factor: 10.2
shiftBlur1 73402 - 26931 = 46471 factor: 10.8
I discovered Klingemann's filter when I checked if other people had done stuff like this, but after benchmarking. Made a quick addition, it's slightly faster than divBlur, but still slower than the shiftBlurs
Obviously, these blurs have a different behavior from the filter(BLUR). Here are the kernels:
divBlur:
0 1 0
1 1 1
0 1 0
shiftBlur: combination you want of:
0 A 0
A B A
0 B 0
Also, you define how much it shifts. Note that by shifting so that the divisor is more/less than actual sum of the kernel, you can blur and fade to black or white at the same time! The linked example uses the shiftBlur3 kernel.
shiftBlur3:
00 51 00
51 52 51
00 51 00
shiftBlur2:
0 3 0
3 4 3
0 3 0
shiftBlur1:
0 1 0
1 4 1
0 1 0
Any ideas for further improvements are welcome. I'll probably update this in the coming days to accept any PImage or PGraphics, and versions that don't wrap.
Again, applet version with ridiculously long and ugly source code can be found here:
EDIT: Updated with Klingemann's optimisation. Now between ten and fourteen times faster than regular blur on my netbook!
1