We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Hello, I'm testing a sketch across two machines and getting very different performance. The sketch in question is SableRalph's port of an NTSC filter shader from shadertoy.com. I'm just doingprintln(frameRate)
.
On a 2013 MacAir (on battery power) with 1.7ghz core i7 CPU + integrated Intel HD Graphics 5000 / OpenGL 4.1 Intel 8.18.29 I get 60fps with/without shader applied. With my 2008 iMac 3.06 GHz Intel Core 2 Duo + NVIDIA GeForce 8800 GS / OpenGL 3.3 Nvidia 8.18.28 I get 20fps with shader, 60fps without.
Conversely, running the Geeks3D GpuTest results in the iMac having a slightly better benchmark - 1065 points vs MacAir 1001.
How can that be - is Processing running in software rendering mode because the 2008 graphics card is too old? Is there any way of verifying something like that?
Continuing my research, from codeanticode:
P3D on the desktop will also fail to run if the video card doesn’t not properly support OpenGL 2.0 and programmable shaders. Fortunately, most GPUs released over the past 3-4 years do support programmable pipelines, including the integrated chipsets from Intel.
The iMac supports up to OpenGL 3.2 AFAIK. It's got 6gb RAM. Is it just too old to expect it to run a windowed shader at full FPS? Should this Q be in the hardware section?
Comparing the shader running under WebGL on the shadertoy.com page and it is ~26fps on the MacAir vs 10fps on the old iMac... Hope it's OK but I crossposted this on SO, because my WebGL performance was also affected.
Answers
Hi! I would say that both GPUs should able to run the filter in hardware, with no software emulation involved.
I think that synthetic performance metrics can be misleading. I haven't used GpuTest but maybe it measures some overall "theoretical" quantity, like max. triangle throughput.
The HD 5000 GPU is much newer than the geforce 8800, and even though both seem to have similar performance according to GpuTest, fragment shader throughput (which is the determinant factor for this kind of filters) in the HD 5000 could still be higher than in the 8800.
Also, I don't know if you are updating the image in every frame (the original NTSC shadertoy uses a video, so in that case data is being copied over the GPU memory constantly), but the macair has a much faster CPU/chipset (i7) and memory transfer speeds than the iMac (Core 2). This could affect performance as well if you are doing frequent CPU-GPU transfers.
Thanks a lot for your help anticode. It seems that the higher results on GpuTest for the older CPU/GPU could be due to a particular test favouring the CPU over GPU. But my question of whether there's a configuration problem is answered.
Perhaps with weak GPUs some effects will run faster rewritten for the CPU?
Hmm it's quite tricky to decide what baseline hardware compatibility to expect on a project... I guess the answer is to offer as many options as possible, eg turn the filter on/off.
Thanks again
Sure, you are welcome!
Again, for this particular type of effects (i.e.: most, if not all, available on shadertoy or glsl sandbox) the bottleneck is the fragment shader performance since the scenes are generated procedurally pixel-by-pixel (unless you are doing additional stuff that involve CPU computations in each frame, or CPU-GPU transfers, etc.) If you look at the sketch code (run by the CPU) in the Processing ports of these effects, it is in fact fairly minimal (typically a single rect covering the entire screen to feed the fragment shader with pixels).
Sometimes you can add tweakable parameters to the shader itself to make it less intensive on older hardware, at the cost of decreased visual quality. For example, in the NTSC filter, it seems that most of the computation is spent in the "for (float n = -41.0; n < 42.0; n += 4.0)" loop inside the NTSCCodec() function - you have to keep in mind that this loop is executed for each pixel in the screen. By changing the ranges of the loop you can make it run faster, but of course this will affect the output of the filter.