Loading...
Logo
Processing Forum
I'm looking for suggestions on improving some code that currently runs really slow (like 10 minutes or longer for a low-res image) - hopefully this description below makes sense!  Full code is a bit long for the forum, but posted here.

I'd like to start with "seed pixels" (hundreds to tens of thousands, depending on the image) and:
  • Store all neighboring pixels in a list (up, right, down, and left)
  • BUT, make sure that there are no duplicates in the resulting list of locations
  • To further complicate, I'd like to do this iteratively, expanding out from a previously gathered set over and over.  This also means not gathering pixels that were gathered in previous states of the system.
Currently I iterate all seed pixels, finding their neighbors with +/- from their position in the pixel array.  The duplication-check is being done with:
Copy code
  1. !Arrays.asList(newPositions).contains(positionToCheck)
This seems to work fine and, when commented out, doesn't seem to be what's bogging down my code, though each iteration makes the system go slower than the last.

I've looked into Set and HashMap as alternatives to standard arrays, but I need to preserve the pixel's position and original color, making things a little tricker.

Ideas at this point:
  • Store a boolean array of previously traversed pixels - check the new positions against that (if the check is what's slowing things down)?
  • Some kind of image-mask applied to source - perhaps the underlying image-processing algorithms are optimized for this kind of 2d matrix?

Whew! All that to say: any ideas on optimization?

Replies(7)

Maybe using 'append' for arrays is the problem?  Would ArrayList be more efficient?
Without reading your code:
Couldn't you make a full array for all points? Since you iterate always over the same fields of points. Initiate all with -1 and when your neighbour is left say field [x-1][y]
No need to append.
I didn't look at your code either, but it should not be that slow ( edit: by that I mean your goal program should not require slowness). I would use an integer ArrayList to keep track of the "parent" pixels. The loop would be something like this:

- Initialize the "parent" ArrayList with your seeds (the pixels doing the neighbor lookups)
- For every "parent" pixel add the neighbors to the ArrayList if they have not been flagged by the boolean array. If they have not been flagged, flag them so they can't be a "child" of a different "parent" and remove the "parent" from the ArrayList.

It is important to search through the ArrayList backwards so that you can make use of myArrayList.add() efficiently and without checking "children" during the same loop as they are being added. Pseudocode:
Copy code
  1. for (int i = parents.size(); i >= 0; i--) {
  2.   int upIndex = parents.get(i).neighborUp();
  3.   if (!flagged[upIndex]) {
  4.     parents.add(upIndex);
  5.     flagged[upIndex] = true;
  6.   }
  7.   
  8.   // Do the same for right, down, left
  9.   
  10.   parents.remove(i);
  11. }
I suspect that the solution is to use a collection that only accepts unique values. You have already mentioned a couple the Set and HashMap but I should point out that the Set is an interface so you would need a concrete class such as HashSet or TreeSet.

When using an ArrayList or LinkedList searching the list is sequential, in other words the time taken to traverse the list is proportional to the number of elements in the list. So if the list size doubles then the time taken to traverse it (such as using contains) doubles and in this case the problem is searching for duplicates so they can be avoided.

Both the Set and Map type collections are very fast at storing and finding elements inside them, and are faster than the ArrayList and LinkedList when the data set becomes large. They also have the advantage of only storing unique values.

The speed is based on how the data is stored inside the collection, and here you should be aware that Tree and Hash collections are very different internally. The TreeMap and HashMap collections need to specify a key which uniquely identifies a value.

For instance a passport number (key) uniquely identifies a person (value). In this program the key will be an xy coordinate. To use a HashMap or TreeMap then we need a class to represent the key and another class to represent the value.

The PixelPos class below has been designed to work as a key for either the TreeMap or HashMap collection you would need to design a class to represent the value (information about the pixel).

It maybe the actual value class is virtually none existant, for instance you may only want to store the pixel color which is an int. In this case rather than create a seperate class add this to the key class and dispense with the value class. Effectively the key is also the value and if you do this then use the TreeSet or HashSet alternatives.

There are a number of advantages over using these collections in that all the elements have unique keys. If you attempt to store an element with the same key (i.e. same hash-code or comparesTo returns zero) then it simply replaces the previous entry which is probably acceptable behaviour in your program. If not then simple use the contains method before inserting the data.

HTH

Copy code
  1. // Must implement Comparable interface for
  2. //tree collection
  3. public class PixelPos implements Comparable {
  4.     public int x;
  5.     public int y;

  6.     // Required for hash collections
  7.     public int hashCode(){
  8.         return (x + "," + y).hashCode();   
  9.     }

  10.     // Required for tree collections
  11.     public boolean equals(Object o){
  12.         PixelPos p = (PixelPos) o;
  13.         return (x == p.x && y == p.y);
  14.     }

  15.     // Required for tree collections
  16.     public int compareTo(Object o) {
  17.         PixelPos p = (PixelPos) o;
  18.         if(x == p.x && y == p.y)
  19.             return 0;
  20.         // If different on y then order by y
  21.         if(y != p.y)
  22.             return (y < p.y) ? -1 :1;
  23.         else // Same y then the x must differ
  24.             return (y < p.y) ? -1 :1;
  25.     }

  26. }

Copy code
  1. ArrayList<Integer> parents = new ArrayList<Integer>();
  2. boolean[] flagged;

  3. void setup() {
  4.   size(800, 800, P2D);
  5.   loadPixels();

  6.   flagged = new boolean[width*height];
  7.   for (int i = 0; i < 10; i++) {
  8.     int rand = (int)random(width*height);
  9.     parents.add(rand);
  10.     flagged[rand] = true;
  11.   }
  12. }

  13. void draw() {
  14.   for (int i = parents.size()-1; i >= 0; i--) {
  15.     int x = parents.get(i)%width;
  16.     int y = parents.get(i)/width;

  17.     int upIndex = parents.get(i)-width;
  18.     if (y > 0 && !flagged[upIndex]) {
  19.       parents.add(upIndex);
  20.       flagged[upIndex] = true;
  21.     }

  22.     int rightIndex = parents.get(i)+1;
  23.     if (x < width-1 && !flagged[rightIndex]) {
  24.       parents.add(rightIndex);
  25.       flagged[rightIndex] = true;
  26.     }

  27.     int downIndex = parents.get(i)+width;
  28.     if (y < height-1 && !flagged[downIndex]) {
  29.       parents.add(downIndex);
  30.       flagged[downIndex] = true;
  31.     }

  32.     int leftIndex = parents.get(i)-1;
  33.     if (x > 0 && !flagged[leftIndex]) {
  34.       parents.add(leftIndex);
  35.       flagged[leftIndex] = true;
  36.     }

  37.     parents.remove(i);
  38.   }

  39.   for (int i = 0; i < width*height; i++) {
  40.     if (flagged[i]) pixels[i] = 0xff000000;
  41.     else pixels[i] = 0xffffffff;
  42.   }
  43.   updatePixels();
  44. }
Wow, everyone - thanks for the great suggestions!

@quarks, thanks especially for the detailed reply on Sets, etc.  These struck me as likely a good call, but scared me a bit!  Probably a good reason to learn about them... :)

I'll have to chew on everyone's suggestions and see what happens.  Will report back (probably with further questions)!
Great suggestions all. 

@quarks, I'm definitely going to have to spend more time looking into your suggestions.  Time to learn some more powerful formats for bigger data!

However, @asimes' solution worked out of the box and was a real performance improvement - thanks!

Some creepy early results can be seen here:
http://www.jeffreythompson.org/blog/2013/01/11/seed-sorting