cloister
Full Member
Offline
Posts: 138
Help me understand this optimization failure.
Feb 17th , 2009, 7:35pm
Ok, this is just weird, and I don't know what to make of it. I've got some code that does a simple least-squares fit of a line against a set of pixels. The points are fixed (x,y) locations within an image, and the pixels are the output of an edge detection function: black where there are no edges, bright pixels at the detected edges. The goal is to be able to estimate the slope of the edge at location (x,y). So what I do is to sample some angles from 0 to 180 degrees, and for each one generate the coefficients of a line passing through that point at that angle (the ax+by+c=0 form of a line). Then, examine a rectangular box of pixels around the (x,y) location. For each pixel within the box, if it is brighter than some cutoff value, then calculate the squared error of that point to the line under consideration. So we have: loop over angles { loop over y-values in the box { loop over x-values in the box { get brightness of pixel; if greater than cutoff { calculate and accumulate the squared distance to the line of that pixel; } } } } It's a lot of nested looping, so I thought I'd try to make it faster. Particularly, since the set of pixels that are brighter than the cutoff is fixed for a given (x,y) location, it seems silly to loop over that box of pixels and test each one with the brightness() function every time. Why not find those first and put them in an array, then run the calculation against that list of points rather than against the pixels directly? On paper, it sounds great. Loop over the pixels in the box only once, instead of once for every angle I consider (which, since I sample the angle every 5 degrees, is 36 times). Except that doing this makes the code run about six times slower. Here's the code as it stands now: // Estimate the tangent angle of the edge at pixel x,y; returns an angle in radians. // (x,y) is the location for which to estimate the slope, e is the image containing // the results of the edge detection algorithm. float estimateTangent(int x, int y, PImage e) { int boxRadius = 5; // search an 11x11 (11 = 2*5+1) box around x,y int xmin, xmax, ymin, ymax; // the bounding coordinates of the box (may be smaller than 11x11 if the box falls off the edge) float theta, bestTheta = 0.0; float error, bestError; float dSquared; xmin = max(0, x-boxRadius); ymin = max(0, y-boxRadius); xmax = min(e.width-1, x+boxRadius); ymax = min(e.height-1, y+boxRadius); // Find and count the bright pixels in the box around (x,y): // To avoid having to loop over the box once for each theta estimation, we'll make lists // of the coordinates of the bright points within the box and then work off of those. // To avoid having to count the points prior to creating these arrays, we'll just size the // arrays to as many pixels as could possibly be in the box. int[] px, py; int numPoints = 0; px = new int[(2*boxRadius+1) * (2*boxRadius+1)]; py = new int[(2*boxRadius+1) * (2*boxRadius+1)]; for(int row = ymin; row <= ymax; row++) for(int col = xmin; col <= xmax; col++) if(brightness(e.pixels[row*e.width + col]) >= cutoff) { px[numPoints] = col; py[numPoints++] = row; } bestError = float(boxRadius*boxRadius * (boxRadius*2+1) * (boxRadius*2+1)); for(theta = 0; theta < PI; theta += 5.0 / 57.2957798) { // calculate values of a, b, c for the above formula, given angle theta and point x,y. // m is the slope of the line in y=mx+i (slope-intercept) form, from which we derive // the a, b, c coefficients of the ax+by+c=0 form of the line, which is needed for // the perpendicular distance formula d(x,y) = ax+by+c / sqrt(a^2+b^2). float m = -sin(theta) / cos(theta); // Use -sin() because our coordinates have y increasing down the image. float i = y-m*x; // this is the y-intercept of the line float a = m, // these variables are just aliases for code clarity. b = -1, // obviously, we could use m and i directly. c = i; // Loop over the bright pixels; for each one, calculate its distance squared and add that to the cumulative error. // Some algebra on the perpendicular distance formula yields: d^2(x,y) = (ax+by+c)^2 / (a^2 + b^2). The denominator of that // result is constant, so we'll pre-calculate that. error = 0.0; float denominator = a*a+b*b; for(int j = 0; j < numPoints; j++) error += (a*px[j] + b*py[j] + c)*(a*px[j] + b*py[j] + c) / denominator; if(error < bestError) // best one so far? If so, remember it. { bestError = error; bestTheta = theta; } } return bestTheta; } The way it was before I tried to do this optimization, eliminated all the stuff about creating and filling the px[] and py[] arrays, but had this: for(int row = ymin; row <= ymax; row++) for(int col = xmin; col <= xmax; col++) if(brightness(e.pixels[row*e.width + col]) >= cutoff) error += (a*col + b*row + c)*(a*col + b*row + c) / denominator; instead of: for(int j = 0; j < numPoints; j++) error += (a*px[j] + b*py[j] + c)*(a*px[j] + b*py[j] + c) / denominator; So there you have it. Any idea why my "optimized" code is six times slower than the naive implementation which does lots of extra looping and works directly off the pixels in PImage e?