We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Lately I have been interested in the idea of regression testing large numbers of Processing sketches. These would be a set of simple sketches -- like the many simple sketches in the Processing reference -- and they could be run through some kind of automated system that notices if they no longer work in the same way due to a changed version of Processing / a library / a mode etc.
I am seeking feedback on this approach. Could it be simplified, improved? Would it be worth a library wrapper?
The approach:
The implementation works like this:
UTest
objectUTest.begin()
and UTest.end()
Here is an example for testing, taken from the Processing triangle()
reference page:
triangle(30, 75, 58, 20, 86, 75);
Here is the regression test version of that sketch:
UTest test;
void setup() {
test = new UTest("https:" + "//processing.org/reference/images/triangle_.png");
test.setup();
}
void draw() {
test.begin();
triangle(30, 75, 58, 20, 86, 75);
test.end();
}
...and here is the (rough, first draft) class:
class UTest {
PImage imgTest, imgGoal, imgDiff;
String goal = "goal.png";
String test = "test.png";
String diff = "diff.png";
String imgURL;
UTest(String url) {
imgURL = url;
this.setup();
}
void setup() {
// remove previous results
fileDelete(sketchPath("test.png"));
fileDelete(sketchPath("diff.png"));
fileDelete(sketchPath("diff_rate.txt"));
// cache goal
File f = new File("goal.png");
if (!f.exists()) {
imgGoal = loadImage(imgURL);
imgGoal.save("goal.png");
} else {
imgGoal = loadImage("goal.png");
}
}
void begin() {
}
void end() {
// save test
saveFrame(test);
imgTest = loadImage(test);
// save diff
PImage pdiff = diff(imgGoal, imgTest);
pdiff.save("diff.png");
// save diffStats
diffStats(pdiff);
noLoop();
}
PImage diff(PImage pa, PImage pb) {
PImage pc = createImage(pa.width, pa.height, RGB);
color c1, c2;
for (int i = 0; i < pa.pixels.length; i++) {
c1 = pa.pixels[i];
c2 = pb.pixels[i];
pc.pixels[i] = color(abs(red(c1)-red(c2)), abs(green(c1)-green(c2)), abs(blue(c1)-blue(c2)));
}
return pc;
}
void diffStats(PImage p) {
int diff = 0;
int maxdiff = 0;
color c = color(255, 255, 255);
for (int i = 0; i < p.pixels.length; i++) {
diff += red(p.pixels[i]) + green(p.pixels[i]) + blue(p.pixels[i]);
maxdiff += (int)(red(c) + green(c) + blue(c));
}
String ratiodiff = String.format("%.2f", diff/(float)maxdiff);
saveStrings("diff.txt", split("" + ratiodiff + ' ' + diff + "/" + maxdiff + ' ', ' '));
println("error: ", ratiodiff, diff + " / " + maxdiff);
}
void fileDelete(String filename) {
File f = new File(filename);
if (f.exists()) {
f.delete();
}
}
}
Currently the two outputs are virtually identical and so a test would pass; if the triangle()
function changed the test would register a high pixel error rate in the diff.txt output file, and the test would fail.
Comments
Some people who participated in an earlier discussion of testing Processing code and code-checking -- @strauss @Trilobyte @quark @koogs @Lord_of_the_Galaxy -- might be interested in these testable sketch outcomes.
This is written from scratch, but the concept is based on more complex code that I use in my personalized exercise generator as discussed there.
The concept is good, I'll look into it as soon as I can.
It seems to me that the core of the technique is to compare two images taken from a single sketch running on different versions of Processing, mode and /or contributed library.
You could avoid creating a difference image by simply comparing the two image pixel arrays. Also you could use bit manipulation to avoid using the Processing methods
red()
,green()
andblue()
methods because they are very slow.When I was creating the Steganos library I discovered that only the 4 most significant bits of a colour channel gave a significant visual colour change so you only need to consider those bits. So I have created this sketch which allows you to experiment with these ideas.
@jeremydouglass Are you familiar with test driven development in p5.js? : https://p5js.org/tutorials/tdd.html
I believe this is something similar to your intentions but in Processing. I don't know much about it as I just went through it recently. They explain the concept and have some sample code. In your case, a simple set of functions to test the Core should not be too difficult to define. One would need to know the layout of the core library. One could also check what test are implemented for p5js and translate them to Processing... or at least it will provided some starting point.
Something that picked my interest is how they tested a fill color operation. In the testing phase, they re-defined the fill operation in order to do a fast check. If they do the test running the actual program, it could take hours or days to go through all the possible colors. They proposed a solution where the colors are generated as numbers (they are the same) but it doesn't depend on the actual frameRate. I was disappointed that the instructions in the webpage are outdated and they didn't work out of the box. There is some troubleshooting that needs to be done to get the instructions on that page working.
Is there test driven tools in Java? A quick online search tells me about JUnit: There is also Spock, TestNG.
To clarify, I am referring to TDD although in this case this is not TDD as the code is already implemented. Here is a comment meaningful to this conversation:
Question:
Answer:
Is this what you have in mind?
Kf
@quark -- thank you for suggesting these are very useful optimizations.
The reason for creating the difference image is that, if a test fails, it leaves an artifact that can be visually inspected during debugging. However I think that your approach of doing an in-memory check makes sense -- perhaps only save the diff image if the test fails, or don't make the test itself dependent on the save.
Interesting bit-shifting approach to difference threshold. Because renders of the "same" processing code isn't always absolutely pixel-exact there does need to be a threshold for error. I'm not sure if least significant bits will always work for setting the threshold, but I'll try it -- the mechanism could be made more complex later if necessary (e.g. testing an antialiasing setting).
@kfrajer -- you are right, this example of a regression test is related to test driven development (TDD) -- specifically, it is related to a discussion as part of the GSOC project Processing.R on Add unit test cases into Processing.R.
Unit testing the mode itself can be done using JUnit -- or tests of R examples might be done in RUnit, although at present RUnit doesn't run on renjin.
However a huge amount of the Processing API is being imported directly into Processing.R without any implementation in the mode. In theory the resulting sketches will all work the same -- but we don't know that. So I was mocking up in Java a quick way to produce an output-checked for R sketches that would compare them to the same ("known good") output from Java sketches. The idea being, rather than writing units on functions, you could then wrap a test library around a few hundred documentation sketches and get good coverage of "expected results" -- and quickly see if anything is breaking, or if any R mode documentation sketches aren't using the API correctly.
This approach has some serious drawbacks -- it doesn't work well with interactivity, timed events, randomness etc. I'm just trying to think of ways of creating lots of test coverage of a huge transpiled API with minimum test writing effort. Also, a big push for milestone 1 on Processing.R is to create documentation (with example sketches). How do we confirm that these are correct and stay correct as development continues?
I have looked the test code in p5.js, it has the similar idea. I think it is helpful for Processing and Processing.R. Although it has some drawbacks, it is the most practical way, IMO.
As for TDD, aka BDD, it is not suitable for Processing.R, I think. Because the behaviors are defined in Processing, and we already have the code base. There are many TDD/BDD testing tools in Java but I think Junit + Jacoco is enough for Processing.R.
WDYT?
sounds like the the old acid2 test for browsers, but for processing?
http://acid2.acidtests.org/
A mere suggestion -
@gaocegege -- how funny that I was working through this as an original idea, but p5.js had already been doing reference-image based unit testing since 2014.
@prince_polka -- interesting comparison to the Acid tests. I hadn't remembered that the final Acid2 / Acid3 test was to do a per-pixel comparison to the reference rendering.
@Lord_of_the_Galaxy -- good point about using
randomSeed().
@jeremydouglass Your thing could be useful nonetheless, since you're doing it for Processing Java.
The outcome of this in developing the Processing.R mode was an approach to documentation.
Each Processing.R documentation code snippet plus image is used to generate an end-to-end test. The test asks: "does this code plus saveFrame produce this image?" If the pre-rendered reference image is the same as the live image created by the code, the test passes; if the two images are different, the test fails.
Just like jeremy said, you could see the template of the Java unit test in https://github.com/gaocegege/Processing.R/blob/master/hack/generate-e2e-test.py#L86. And it works well for static sketches.
There are some problems during the implementation: