How to map large amounts of data points

Im working on a project where I have to plot over 600,000 data points on to a map. Im using unfolding maps and the data points based on latitude and longitude and are in a csv format. Because I have so many data points the program is either extremely slow or shuts down on me sometimes. I was wondering if there is any way that I can combine points from a location range so that the program only has to plot one point instead of plotting all in the range of lets say 5 miles? any suggestions??

Answers

  • edited May 2015

    do you work with different zooms in the map?

    Because if you do, it might change all the time?

    for your problem:

    have a virtual grid on the map. Count the points within each cell. When they are x<3 just draw the points. If x>=3 draw a circle / point in another color and put the x-value underneath.

    The idea is clustering, maybe check wikipedia

    ;-)

  • also, don't store all your points in a single flat list - you'll have to iterate through them all to find the ones to display. partition them sensibly and you only have to handle the relevant subsets, not everything everytime.

    and don't redraw them unless you need to - if the data hasn't changed and the map hasn't moved then there's no need to update the screen.

  • no im not working with any zooms, The problem is i already did partition them, the original file was 80GB of pure text data and hundreds of millions lines of code:/

  • Good job to crunch such a huge amount of data! Wow!

    what countries / country are we talking about?

    If it's US you could do it state by state?

    What kind of data? If it's temperature you could average regions, e.g.

    ;-)

  • its just of the US and if data on bird sightings in a period of 12 months

  • so you could cluster them by state and just have a circle with a number...

    the bigger the number the bigger the circle

    ;-)

  • edited May 2015

    do you have the species of the birds?

    then you could filter / switch between the species.

  • also you could just let each point search for its neighbours and when it's nearer than A join them

  • Im only doing one species because having more then one would be about a million lines so to try to speed up the processing code I only used one species, eventually im gonna try to figure out a way to do several species and be abel to switch between each one. how would I make them find the neighboring points?

  • edited May 2015

    Hello !

    600 000 data-points doesn't sound so much for me.

    How much data are you using by point ?

    What do you want to do exactly, step by step ?

  • edited May 2015

    good question!

    First idea

    When you have a static image with no movement at all....

    • you could just wait till the image is there and save it to hard drive and then show the image?

    Second idea

    • Anyway. First make a copy A of your sketch and put it somewhere save.

    • Then make another copy B of your sketch.

    • Then make another copy C of your sketch.

    When you have a static image with no movement at all.... you could use the clustering approach (find the neighbour points) in an extra program (derived from B) which just writes a new data set to the hard drive containing

    • x,y,number of birds

    for each cluster.

    Number of birds being maybe 1 to 99999 (you get one bird for a cluster when there are no real close neighbours).

    So in this approach (with B') you just read the stuff, cluster it without displaying it and write it back to hard drive.

    Then modify your display sketch (derived from C) in a way that it can load and display clusters (written by B').

    ;-)

  • edited May 2015

    What you should do, in my opinion is

    In the setup :

    1) get all the position-datas from your external file or "unfolding maps" at the beginning

    2) convert your latitude/longitude into XYZ position located on a sphere and store them in an array

    In the draw :

    3) check the distance between the previous position (of the center of the whole map) and the current one, and if you moved more than a well-defined minimal distance, do something like that :

    4) loop on you all your position, and store every points located in the radius you want (in a circle in 2D or in a another sphere in 3D)

    5) use "unfolding map" just once to get the data of each point

    6) draw what you want to draw

  • edited May 2015

    B' works in a way that it loads all points without displaying them.

    Have them in a list with pos x,y,number of birds, dead.

    number of birds = 1 for all. Dead = false for all.

    you can either just work inside this list or with a separate result list I guess. Maybe there is an wikipedia article on that?

    Anyway

    it has 2 nested for-loops and

    • the first from i = 1 to n-1,

    • the second from j = i+1 to n

    now compare the birds i and j.

    When the birds are not marked as dead:

    • When dist() of the both is < maxDist (find out how much distance is a good value) in the result list add a point and give it as a pos the value of the average of the 2 birds and as number of how many points the sum (2).

    When dist() of the both is >= maxDist then just copy them in the result list.

    Now mark both birds as dead.

    When done with both for-loops, do the same a few times with the result list I think. (oups, it's getting vague)

    Remember when adding a bird to a cluster with 5 birds the average is something like that the bird is 5 times less strong than the cluster, Thus the new position is not in the middle but 5 times more towards the cluster. Or so.

    ;-)

  • thank you for your comments! this has been such a task for me! ive only been working with code for like 4 months now and this has just gone all over my head, im currently just making it as a gif so i dont have to deal with the maps shutting down on me all the time just to get a sense of how its supposed to look like. Im hopefully gonna be able to dedicate my summer to learning more and actually being able to wrap my head around your guess suggestions haha but thank you so much!

  • I remember seeing this thread a little while ago and had meant to comment. It sounds like the problem is you're trying to iterate and display all that data dynamically each frame; so at some point you hit a processing limit.

    If you're just trying to generate a static image at the end of it, you could 'bake' the data into an image layer as each line of data is read: e.g. draw a semi-transparent circle to a specific location. As these semi transparent layers build up you'll get a stronger colour in those areas with a higher concentration.

    IIRC you can parse a text file a line at a time; rather than trying to load the whole thing in one go; so this approach would in theory allow you to process any amount of data; the only limitation being the size of the single image that you're holding - and drawing to - in memory... I'm working on something that will use similar principles and may have some demo code to post in the next few weeks. Feel free to drop me a PM as a reminder if you don't see anything ;)

Sign In or Register to comment.