Pixel test for simple sketches -- feedback request

Lately I have been interested in the idea of regression testing large numbers of Processing sketches. These would be a set of simple sketches -- like the many simple sketches in the Processing reference -- and they could be run through some kind of automated system that notices if they no longer work in the same way due to a changed version of Processing / a library / a mode etc.

I am seeking feedback on this approach. Could it be simplified, improved? Would it be worth a library wrapper?

The approach:

  1. run a standard reference sketch
  2. save a screen frame
  3. compare the frame to a known-good documentation image
  4. record the pixel error ratio for automated test tools.

The implementation works like this:

  1. in setup, load the known-good documentation image into a UTest object
  2. in draw, wrap the simple sketch commands in UTest.begin() and UTest.end()
  3. running the sketch generates test output

Here is an example for testing, taken from the Processing triangle() reference page:

 triangle(30, 75, 58, 20, 86, 75);

Here is the regression test version of that sketch:

UTest test;

void setup() {
  test = new UTest("https:" + "//processing.org/reference/images/triangle_.png");
  test.setup();
}
void draw() {
  test.begin();
  triangle(30, 75, 58, 20, 86, 75);
  test.end();
}

...and here is the (rough, first draft) class:

class UTest {
  PImage imgTest, imgGoal, imgDiff;
  String goal = "goal.png";
  String test = "test.png";
  String diff = "diff.png";
  String imgURL;
  UTest(String url) {
    imgURL = url;
    this.setup();
  }
  void setup() {
    // remove previous results
    fileDelete(sketchPath("test.png"));
    fileDelete(sketchPath("diff.png"));
    fileDelete(sketchPath("diff_rate.txt"));
    // cache goal
    File f = new File("goal.png");
    if (!f.exists()) {
      imgGoal = loadImage(imgURL);
      imgGoal.save("goal.png");
    } else {
      imgGoal = loadImage("goal.png");
    }    
  }
  void begin() {
  }
  void end() {
    // save test
    saveFrame(test);
    imgTest = loadImage(test);
    // save diff
    PImage pdiff = diff(imgGoal, imgTest);
    pdiff.save("diff.png");
    // save diffStats
    diffStats(pdiff);
    noLoop();
  }
  PImage diff(PImage pa, PImage pb) {
    PImage pc = createImage(pa.width, pa.height, RGB);
    color c1, c2;
    for (int i = 0; i < pa.pixels.length; i++) {
      c1 = pa.pixels[i];
      c2 = pb.pixels[i];
      pc.pixels[i] = color(abs(red(c1)-red(c2)), abs(green(c1)-green(c2)), abs(blue(c1)-blue(c2)));
    }
    return pc;
  }
  void diffStats(PImage p) {
    int diff = 0;
    int maxdiff = 0;
    color c = color(255, 255, 255);
    for (int i = 0; i < p.pixels.length; i++) {
      diff += red(p.pixels[i]) + green(p.pixels[i]) + blue(p.pixels[i]);
      maxdiff += (int)(red(c) + green(c) + blue(c));
    }
    String ratiodiff = String.format("%.2f", diff/(float)maxdiff);
    saveStrings("diff.txt", split("" + ratiodiff + ' ' + diff + "/" + maxdiff + ' ', ' '));
    println("error: ", ratiodiff, diff + " / " + maxdiff);
  }
  void fileDelete(String filename) {
    File f = new File(filename);
    if (f.exists()) {
      f.delete();
    }
  }
}

Currently the two outputs are virtually identical and so a test would pass; if the triangle() function changed the test would register a high pixel error rate in the diff.txt output file, and the test would fail.

Tagged:

Comments

  • edited June 2017

    Some people who participated in an earlier discussion of testing Processing code and code-checking -- @strauss @Trilobyte @quark @koogs @Lord_of_the_Galaxy -- might be interested in these testable sketch outcomes.

    This is written from scratch, but the concept is based on more complex code that I use in my personalized exercise generator as discussed there.

  • The concept is good, I'll look into it as soon as I can.

  • edited June 2017

    It seems to me that the core of the technique is to compare two images taken from a single sketch running on different versions of Processing, mode and /or contributed library.

    You could avoid creating a difference image by simply comparing the two image pixel arrays. Also you could use bit manipulation to avoid using the Processing methods red(), green() and blue() methods because they are very slow.

    When I was creating the Steganos library I discovered that only the 4 most significant bits of a colour channel gave a significant visual colour change so you only need to consider those bits. So I have created this sketch which allows you to experiment with these ideas.

    test

    PImage img0, img1;
    
    void setup() {
      size(600, 400);
      background(255);
      img0 = getImage(0, 0, color(220, 23, 111), color(45, 128, 127), color(22, 23, 67), 1.2);
      img1 = getImage(1, 1, color(230, 11, 128), color(12, 132, 128), color(32, 45, 96), 2.5);
      image(img0, 0, 0);
      image(img1, img0.width, 0);
      float propDifferent = imgDifference(img0, img1);
      println(propDifferent);
      fill(0);
      text("Difference factor " + propDifferent, 10, height - 10);
    }
    
    /**
     Returns the proportion differnce between two images.
     Multiply by 100 for a percentage
     */
    float imgDifference(PImage i0, PImage i1) {
      float diff = 0;
      i0.loadPixels();
      int[] ip0 = i0.pixels;
      i1.loadPixels();
      int[] ip1 = i1.pixels;
      for (int n = 0; n < ip0.length; n++) {
        int pxl0 = ip0[n], r0, g0, b0;
        int pxl1 = ip1[n], r1, g1, b1;
        r0 = (pxl0 >> 20) & 0xF;
        g0 = (pxl0 >> 12) & 0xF;
        b0 = (pxl0 >> 4) & 0xF;
        r1 = (pxl1 >> 20) & 0xF;
        g1 = (pxl1 >> 12) & 0xF;
        b1 = (pxl1 >> 4) & 0xF;
        diff += abs(r0 - r1) + abs(g0 - g1) + abs(b0 - b1);
      }
      // Each colour channel can have a difference 0-15
      // Considering 3 colour channels (ignoring alpha)
      return diff / (ip0.length * 3 * 15);
    }
    
    /**
     dx, dy translate the ellipse
     bg background colour
     fl ellipse fill colour
     st ellipse stroke colour
     st stroke thickness
     */
    PImage getImage(float dx, float dy, int bg, int fl, int st, float sw) {
      PGraphics pg = createGraphics(width/2, height - 40);
      pg.beginDraw();
      pg.background(bg);
      pg.fill(fl);
      pg.stroke(st);
      pg.strokeWeight(sw);
      pg.translate(dx, dy);
      pg.ellipse(pg.width/2, pg.height/2, 0.8 * pg.width, 0.8 * pg.height);
      pg.endDraw();
      return pg.get();
    }
    

  • @jeremydouglass Are you familiar with test driven development in p5.js? : https://p5js.org/tutorials/tdd.html

    I believe this is something similar to your intentions but in Processing. I don't know much about it as I just went through it recently. They explain the concept and have some sample code. In your case, a simple set of functions to test the Core should not be too difficult to define. One would need to know the layout of the core library. One could also check what test are implemented for p5js and translate them to Processing... or at least it will provided some starting point.

    Something that picked my interest is how they tested a fill color operation. In the testing phase, they re-defined the fill operation in order to do a fast check. If they do the test running the actual program, it could take hours or days to go through all the possible colors. They proposed a solution where the colors are generated as numbers (they are the same) but it doesn't depend on the actual frameRate. I was disappointed that the instructions in the webpage are outdated and they didn't work out of the box. There is some troubleshooting that needs to be done to get the instructions on that page working.

    Is there test driven tools in Java? A quick online search tells me about JUnit: There is also Spock, TestNG.

    To clarify, I am referring to TDD although in this case this is not TDD as the code is already implemented. Here is a comment meaningful to this conversation:

    Question:

    I wrote about 3k line code so how can i add tdd without messed up my entire methods

    Answer:

    The point of TDD is to write the unit tests first, THEN write your 3k (or however many) lines of code. You can certainly (and I recommend you do) add unit tests for the existing code, but doing so is not TDD. That said, you can do TDD going forward for any new functionality (or bug fixes).

    Is this what you have in mind?

    Kf

  • @quark -- thank you for suggesting these are very useful optimizations.

    The reason for creating the difference image is that, if a test fails, it leaves an artifact that can be visually inspected during debugging. However I think that your approach of doing an in-memory check makes sense -- perhaps only save the diff image if the test fails, or don't make the test itself dependent on the save.

    Interesting bit-shifting approach to difference threshold. Because renders of the "same" processing code isn't always absolutely pixel-exact there does need to be a threshold for error. I'm not sure if least significant bits will always work for setting the threshold, but I'll try it -- the mechanism could be made more complex later if necessary (e.g. testing an antialiasing setting).

  • edited June 2017

    @kfrajer -- you are right, this example of a regression test is related to test driven development (TDD) -- specifically, it is related to a discussion as part of the GSOC project Processing.R on Add unit test cases into Processing.R.

    Unit testing the mode itself can be done using JUnit -- or tests of R examples might be done in RUnit, although at present RUnit doesn't run on renjin.

    However a huge amount of the Processing API is being imported directly into Processing.R without any implementation in the mode. In theory the resulting sketches will all work the same -- but we don't know that. So I was mocking up in Java a quick way to produce an output-checked for R sketches that would compare them to the same ("known good") output from Java sketches. The idea being, rather than writing units on functions, you could then wrap a test library around a few hundred documentation sketches and get good coverage of "expected results" -- and quickly see if anything is breaking, or if any R mode documentation sketches aren't using the API correctly.

    This approach has some serious drawbacks -- it doesn't work well with interactivity, timed events, randomness etc. I'm just trying to think of ways of creating lots of test coverage of a huge transpiled API with minimum test writing effort. Also, a big push for milestone 1 on Processing.R is to create documentation (with example sketches). How do we confirm that these are correct and stay correct as development continues?

  • edited June 2017

    I have looked the test code in p5.js, it has the similar idea. I think it is helpful for Processing and Processing.R. Although it has some drawbacks, it is the most practical way, IMO.

    As for TDD, aka BDD, it is not suitable for Processing.R, I think. Because the behaviors are defined in Processing, and we already have the code base. There are many TDD/BDD testing tools in Java but I think Junit + Jacoco is enough for Processing.R.

    WDYT?

    var testRender = function(file, sketch, callback) {
      sketch.loadPixels();
      var p = sketch.pixels;
      var ctx = sketch;
      sketch.clear();
      sketch.loadImage(file, function(img) { // TODO: Handle case where file doesn't load
        ctx.image(img, 0, 0, 100, 100);
        ctx.loadPixels();
        var n = 0;
        for (var i=0; i<p.length; i++) {
          for (var j=0; j<4; j++) {
            var diff = Math.abs(p[i][j] - ctx.pixels[i][j]);
            n += diff;
          }
        }
        var same = (n/(256*4*p.length)) < 0.015;
        same = same && (ctx.pixels.length === p.length);
        callback(same);
      });
    }
    
  • sounds like the the old acid2 test for browsers, but for processing?

    http://acid2.acidtests.org/

  • A mere suggestion -

    • You can write code to actually execute both the "known" sketch, and the test. You may choose from one of the multiple method that this can be done in.
    • Then, for the mouse/keyboard part you can choose between using Java Robot API, or writing a simple implementation within Processing itself. Note that the first requires you to know how to use Robot, and also how to calculate relative positions correctly (for mouse).
    • As for random things, I'd simply recommend using randomSeed with the same values for both.
  • @gaocegege -- how funny that I was working through this as an original idea, but p5.js had already been doing reference-image based unit testing since 2014.

    @prince_polka -- interesting comparison to the Acid tests. I hadn't remembered that the final Acid2 / Acid3 test was to do a per-pixel comparison to the reference rendering.

    @Lord_of_the_Galaxy -- good point about using randomSeed().

  • @jeremydouglass Your thing could be useful nonetheless, since you're doing it for Processing Java.

  • The outcome of this in developing the Processing.R mode was an approach to documentation.

    Each Processing.R documentation code snippet plus image is used to generate an end-to-end test. The test asks: "does this code plus saveFrame produce this image?" If the pre-rendered reference image is the same as the live image created by the code, the test passes; if the two images are different, the test fails.

  • edited November 2017

    Just like jeremy said, you could see the template of the Java unit test in https://github.com/gaocegege/Processing.R/blob/master/hack/generate-e2e-test.py#L86. And it works well for static sketches.

    There are some problems during the implementation:

    • The diff between the generated image and online version is hard to compare, since they may be saved from different operating systems or machines.
    • There is no way to generate animations for the code.
Sign In or Register to comment.