Best way to sort through a 6,000~ element array based on distance?

I have a .csv file with roughly 6k entries. They are cities with associated latitude and longitude.

I have a map on the screen that pops up when you run the sketch. Pixel coords <--> lat/long is working fine.

I want the user to be able to click on the map and to find the nearest city and print out its name.

I know how to do this in an inefficient way, such as this code/psuedocode (hope that's ok!)

float shortestDistance = 999;

for (int i = 0; i < data.length; i++) {
  float testDistance = dist(data[i].lat, data[i].lon, pixelToMap(mouseX), pixelToMap(mouseY));
  if (testDistance < shortestDistance) {
    shortestDistance = testDistance;
    shortestIndex = i;
  }
}
println(data[shortestIndex].city);

... something like that anyway. I just feel like there is probably a smarter way to do this, and this is an area of programming that I am pretty unfamiliar with. Any help is greatly appreciated.

Answers

  • Pre-process the list for each element, and store the index of the nearest city with the city object?

    data[i].nearestCity = indexOfOtherCity;

  • So, you don't really want a sort, but just find a minimum.

    Your code is actually fine, since the data will change each time it is run.

    Perhaps pre-compute the pixelToMap() calls (before the loop), to avoid additional computations.

  • edited November 2013

    NOTE: The algorithm I made up does not work for all cases after all as koogs pointed out, scroll down to see why. Didn't imagine that scenario when I thought it up.

    Pick one of the dimensions, say x. Sort all of the points by x (this takes O(n log n) time but only has to be done once).

    Use a binary search to find the point closest on the x-axis and determine its distance to your point. This will take O(log n) time to find the point in the sorted list.

    Test the distance going left in the list until you find a point with a larger distance than the last one you tested. The last one you tested is the closest on the left side. This has to be done because sorting by x didn't give any information about y.

    Do the same thing going right. Compare the distances of the "best" points on the left and right side (1 comparison).

    In the worst case this would still take O(n) time like your example but it is extremely unlikely that this would happen unless it was set up for worst case on purpose.

    Edit: After the initial sorting, the worst case for this would be 13 steps for the binary search (log base 2 of 6,000) + some number to the left + some number to the right + 1.

  • dist() does an expensive sqrt calculation, you can get rid of that by squaring both sides

    int COUNT = 6000;
    PVector[] pos = new PVector[COUNT];
    
    void setup() {
      for (int i = 0 ; i < COUNT ; i++) {
        pos[i] = new PVector(random(1000), random(1000));
      }
      float x = random(1000);
      float y = random(1000);
    
      // your version
      long start = millis();
      int shortestIndex = 0;
      float shortestDistance = 1000000;
      for (int i = 0 ; i < COUNT ; i++) {
        float testDistance = dist(pos[i].x, pos[i].y, x, y);
        if (testDistance < shortestDistance) {
          shortestDistance = testDistance;
          shortestIndex = i;
        }
      }
      long end = millis();
      println("Method A: (" + shortestIndex + ") in " + (end - start) + "ms");
    
      // my version    
      start = millis();
      for (int i = 0 ; i < COUNT ; i++) {
        float testDistance = ((pos[i].x - x) * (pos[i].x - x)) + ((pos[i].y - y) * (pos[i].y - y));
        if (testDistance < shortestDistance) {
          shortestDistance = testDistance;
          shortestIndex = i;
        }
      }
      end = millis();
      println("Method B: (" + shortestIndex + ") in " + (end - start) + "ms");
      noLoop();
    }
    
    void draw() {
    }
    

    THAT SAID, the original takes 2ms. 2 milliseconds. it's barely worth worrying about.

  • (i've seen this done with many more points where the space was divided up into a grid. then when the mouse was clicked you only needed to check the points in the square you clicked in and the eight surrounding squares, which cuts down the work considerably. won't work if the distribution is such that you have empty squares though - there's a chance of finding nothing at all)

  • asimes

    asimes, does your algorithm work with the above situation? the red dot is closest to the black on the x axis. the blue is further away so, as i understand it, you'd choose the red dot as nearest. but the green dot is actually closer.

  • edited November 2013

    float testDistance = ((pos[i].x - x) * (pos[i].x - x)) + ((pos[i].y - y) * (pos[i].y - y));

    Don't forget Processing got the sq() function! :D

    final float testDist = sq(pos[i].x - x) + sq(pos[i].y - y);
    
  • edited November 2013

    This class Colour I've made uses both Arrays.sort() & Arrays.binarySearch()!

    File "RGB.txt":

    Black, 0
    Blue,0,0, 255
    Cyan, 0, 255, 255
    Gray ,           128
    Tan, 210, 180, 140
    Green, 0, 128, 0,,,
    Magenta, 255, 0, 255
    Red, 255, 0, 0
    White, -1
    Yellow, 255, 255, 0
    Saddle Brown, 140, 70, 20
    Orange, 255, 165, 0
    Tomato, 255, 100, 70
    Pink, 255, 192, 203
    Gold, 255, 215, 0
    Khaki, 240, 230, 140
    Indigo, 75, 0, 130
    Dark Slate Blue, 70, 60, 140
    Olive, 128, 128, 0
    Teal, 0, 128, 128
    Dim Gray, 105
    

    And program:

    /** 
     * Colour Container II (v2.31)
     * by GoToLoop (2013/Sep)
     * 
     * forum.processing.org/topic/
     * file-of-colors-that-can-be-loaded-for-all-my-programs-
     * hope-this-is-a-simple-question
     */
    
    import java.util.Arrays;
    
    final String[] myColors = {
      "Tomato", "Gold", "Green", "Saddle Brown", "Teal", "Green", "Tan"
    };
    
    Colour[] palette;
    
    final static String NAME = "RGB.txt", SEPARATOR = ",";
    
    void setup() {
      size(600, 400);
      frameRate(.5);
    
      palette = loadPalette(NAME, SEPARATOR);
    
      println(palette);
      println();
    }
    
    void draw() {
      final int idx = frameCount % myColors.length;
      final String name = myColors[idx];
    
      final int search = Arrays.binarySearch(palette, name);
    
      final Colour found = palette[search];
      final color c = found.get();
    
      background(c);
    
      println("#" + nf(search, 2) + "  " + found);
    
      frame.setTitle("Index: " + idx + "\tColor: " 
        + found.name + "\tCode: " + hex(c, 6));
    }
    
    Colour[] loadPalette(String path, String delim) {
      final String[] lines = loadStrings(path);
      final int len = lines.length;
    
      final Colour[] RGBs = new Colour[len];
    
      for (int i = 0; i != len;) {
        final String[] units = split(lines[i], delim);
        final String name = units[0];
        final color[] channels = int( trim(subset(units, 1)) );
    
        RGBs[i++] = new Colour(name, channels);
      }
    
      Arrays.sort(RGBs);
      return RGBs;
    }
    
    class Colour implements Comparable {
      short R, G, B;
      String name;
    
      Colour(String label, color r, color g, color b) {
        name = label;
        set(r, g, b);
      }
    
      Colour(String label, color c) {
        name = label;
        set(c);
      }
    
      Colour(String label, color[] a) {
        name = label;
        set(a);
      }
    
      void set(color r, color g, color b) {
        R = (short) constrain(r, 0, 0xFF);
        G = (short) constrain(g, 0, 0xFF);
        B = (short) constrain(b, 0, 0xFF);
      }
    
      void set(color c) {
        if (c >= 0 & c < 0400) {
          setGrey(c);
          return;
        }
    
        R = (short) (c >> 020 & 0xFF);
        G = (short) (c >> 010 & 0xFF);
        B = (short) (c & 0xFF);
      }
    
      void set(color[] a) {
        final int len = a.length;
    
        if (len == 1) {
          if (a[0] >= 0 & a[0] < 0400)  setGrey(a[0]);
          else                          set(a[0]);
    
          return;
        }
    
        if (len > 0)  R = (short) constrain(a[0], 0, 0xFF);
        if (len > 1)  G = (short) constrain(a[1], 0, 0xFF);
        if (len > 2)  B = (short) constrain(a[2], 0, 0xFF);
      }
    
      void clear() {
        R = G = B = 0;
      }
    
      void setGrey(color c) {
        R = G = B = (short) constrain(c, 0, 0xFF);
      }
    
      color get() {
        return color(R, G, B);
      }
    
      color[] toArray() {
        return new color[] {
          R, G, B
        };
      }
    
      String name() {
        return name.toLowerCase().trim();
      }
    
      String toString() {
        return "\"" + name + "\"  \t [ " + R + ", " + G + ", " + B + " ]";
      }
    
      int compareTo(Object other) {
        return other instanceof Colour ? 
        name.compareTo( ( (Colour) other ).name ) : 
        name.compareTo( (String) other );
      }
    }
    
  • I go along with @PhiLho - all you need is to search for the minimum distance and your code does that.

    It is a simple O(n) problem that for 6000 items is not worth the effort of trying to write sophisticated algorithms to save a few nanoseconds, especially since the code will only be executed on a mouse click and will the user notice if it takes an extra millisecond or so to show the name of the city, I think not.

  • @koogs, Ah I guess not then. I didn't do a real proof, just drew some dots on a piece of paper. I'll put a note up next to it

  • I guess I didn't realize it would be so quick. I only have a small dataset I made to test it, I keep forgetting to grab the big 6k entry database from home ;p

    Thank you all so much for your help, I found every suggestion very useful and made the necessary changes! I'll let you know if it's slow for some reason.

    One thing I have in mind--not that I'd implement it--but I'm curious how you might go about having the search wrap around the globe. For instance, if I click way off to the right of my map, which ends to the east of Asia, it won't recognize that Hawaii is the closest place, rather than Cairo. Thoughts?

    THX!

  • edited November 2013

    For instance, if I click way off to the right of my map, which ends to the east of Asia, it won't recognize that Hawaii is the closest place, rather than Cairo. Thoughts?

    If the longitudinal (east<>west) distance between the cities (d) > half the map width (mapwidth) then the distance in the other direction is shorter (mapwidth - d). I have also assumed that the x direction is longitude (east<>west) and y direction is latitude (north<>south). Your original post appears to have these swapped.

    float shortestDistance = Float.MAX_VALUE;
    
    float mwidth = ???; // you will know this
    float mapwidth2 = mapwidth / 2; // save dong this for each city
    for (int i = 0; i < data.length; i++) {
      float deltaX = abs(data[i].lon - pixelToMap(mouseX));
      if(deltaX > mapWidth2)
        deltaX = mapWidth - deltaX;
      float deltaY = data[i].lat - pixelToMap(mouseY));
      float d2 = deltaX * deltaX + deltaY * deltaY; // distance squared
      if (d2 < shortestDistance) {
        shortestDistance = d2;
        shortestIndex = i;
      }
    }
    println(data[shortestIndex].city); 
    
Sign In or Register to comment.