Why is red() slower than bit-fiddling?

(Perhaps this has been asked before? Couldn't find it searching the forum...)

The Processing reference says that using the built-in red() function will be slower than direct bit-fiddling. Running some test code...

color c = color(255,0,0);
int numRuns = 10000000;

println(numRuns + " runs...");

// use red()
long startTime = millis();
for (int i=0; i<numRuns; i++) {
  float r = red(c);
}
println("red():        " + (millis()-startTime) + "ms");

// use bit-fiddling
startTime = millis();
for (int i=0; i<numRuns; i++) {
  float r = c >> 16 & 0xFF;
}
println("bit-fiddling: " + (millis()-startTime) + "ms");

I see that red() is about 2–3 times slower, and that it varies is speed quite a bit.

Why is this?

Tagged:

Answers

  • Took me a little while, but here's the source code for the red() function:

    public final float red(int rgb) {
        float c = (rgb >> 16) & 0xff;
        if (colorModeDefault) return c;
        return (c / 255.0f) * colorModeX;
    }
    
  • So it seems the answer might be "converting each pixel from int to float takes longer than not doing it" -- and returning float is part of the red() signature. Being able to return a float value is necessary if using an alternate color mode -- but inefficient otherwise.

  • don't forget the function-call overhead

  • Thanks jeremydouglass and koogs. On further testing, I see some pretty uneven results. Below is my testing code – maybe there's something causing it to give weird results?

    long numRuns = 100000000;
    color c =      color(255,0,0);
    
    
    void setup() {  
      println(numRuns + " runs...\n");
    
    
      // use P5 red() command
      long startTime = millis();
      for (long i=0; i<numRuns; i++) {
        float r = red(c);
      }
      println("P5 red():                       " + (millis()-startTime) + "ms");
    
    
      // use source code for red()
      startTime = millis();
      for (long i=0; i<numRuns; i++) {
        float r = c >> 16 & 0xFF;
        if (true) { };
        //return (c / 255.0f) * colorModeX;
      }
      println("P5 red() source:                " + (millis()-startTime) + "ms");
    
    
      // use bit-fiddling (float)
      startTime = millis();
      for (long i=0; i<numRuns; i++) {
        float r = c >> 16 & 0xFF;
      }
      println("bit-fiddling (float):           " + (millis()-startTime) + "ms");
    
    
      // use bit-fiddling (int)
      startTime = millis();
      for (long i=0; i<numRuns; i++) {
        int r = c >> 16 & 0xFF;
      }
      println("bit-fiddling (int):             " + (millis()-startTime) + "ms");
    
    
      // bit-fiddling (int) as a function
      startTime = millis();
      for (long i=0; i<numRuns; i++) {
        int r = redFunct(c);
      }
      println("bit-fiddling (int) as function: " + (millis()-startTime) + "ms");
    
      exit();
    }
    
    int redFunct(int rgb) {
      int r = (rgb >> 16) & 0xFF;
      return r;
    }
    
  • Can you say something specific about what your "weird results" are, and why they are weird? People with different hardware / software might not be seeing what you are seeing.

    JVM might aggressively optimize / inline test code like this, so you aren't always comparing what you think you are measuring.

  • Sorry, should have included my output. I get really differing numbers every time I run my sketch. Sometimes red() is just as fast, sometimes twice as slow or worse. It's stated so unequivocally in the Processing docs that red() is worse than bit-fiddling (which is confusing for newbies and students), I wondered if that should be changed or remains a good suggestion?

  • Hmm. Not sure.

    At some point, in theory, optimizations (on compile or run) might make that "red() is worse") not true anymore -- but my experience until recently has been that in multi-frame production sketches it always is, and significantly so.

    I tried your sketch as something that might be a closer approximation of a "real test" -- loading an image of 4096 x 4096 = 16,777,216 pixels, analyzing its red channel over and over again in a draw loop(), and producing output based on the analysis. In theory that shoudl be slightly more resistant shortcutting useless code (as all your test computations could potentially be pruned, since they are redundant and produce no output). Still, I'm not seeing much difference at all over time between the different options, even when I crank it up to 8000+ square or more and floor the framerate.

  • Yeah, this gets past my CS knowledge on compilers, etc. I may put it in as a pull request on the docs and see. Maybe just re-writing them to say "red() may be slower, consider this other option" instead of something so declarative.

    Super interesting everyone, thanks.

  • ... instead of something so declarative.

    Processing's reference is full of "must"s and "have to"s, even though it's not true 99%! [-(

Sign In or Register to comment.