We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › split array of points into local neighbourhoods
Page Index Toggle Pages: 1
split array of points into local neighbourhoods (Read 856 times)
split array of points into local neighbourhoods
Feb 9th, 2009, 7:47am
 
I am wondering if anyone can suggest a way to take a collection of points and break it into smaller pieces by specifying a threshold distance. Anything less than that distance would remain grouped together in one collection while anything beyond would create a new list. this is not about splitting into two lists but about finding local neighbourhoods amongst communities spread over large territories.

I am having trouble developing even a schematic version of how this might work...

any ideas are really appreciated.
Re: split array of points into local neighbourhood
Reply #1 - Feb 9th, 2009, 8:33am
 
A 'voronoi diagram' might be what you are looking for.
http://www.donhavey.com/blog/tutorials/tutorial-7-voronoi-diagrams/

Searching for 'vector quantization' may give you some info, too.
Re: split array of points into local neighbourhood
Reply #2 - Feb 9th, 2009, 4:19pm
 
thanks for that. I am already pretty familiar with Voronoi and am already using it for the sketch I am speaking. But I don't see how it would solve the problem.

Voronoi will divvy all space up (within a boundary) between all available points but I need lists of points where its constituents are within a certain distance of one another. I can't imagine that it's that hard but I can't wrap my head around it.

It would atart by checking the distance between each point and every other point in the base list. Then, those points outside the given limit would trigger the creation of a new Arraylist and add those points to it. The problem I see is figuring out how to consolidate those new lists if the points in the new arraylists are actually neighbours. If I could attach hand sketches it might help to describe what I want better....

if you have any other ideas...please! and thanks!
Re: split array of points into local neighbourhood
Reply #3 - Feb 9th, 2009, 10:18pm
 
There are some algorithms out there for doing cluster analysis.  I have read about them, but have never implemented them myself, so I'm not super confident that this explanation is going to be correct but I'll give it a shot:

The basic idea is that you posit the existence of some number of clusters, N, within the data.  Each cluster is described by a position P and a radius R.  Things that fall inside the circle {P, R} belong to the cluster, things that don't, don't.

The problem then is to establish actual values for P[i] and R[i] for values of i in [0..N-1].

The basic algorithm is that you seed them with arbitrary values (usually random points for the P values and some selected value for R), and then you iteratively adjust the values of P and R until enough of the data is covered by the clusters that you're happy.

In each iteration, you do more or less this:
* loop over all the data points
* for each point, calculate the probability that it belongs to each cluster (more on this later).  Assign it to the cluster for which it has the highest probability of belonging.
* for each cluster, recalculate the cluster's P value as some appropriately weighted average of the positions of the data points that belong to it, and the R value as some appropriately weighted average of the distance of the cluster's data points to the cluster's new center.
* stop when some threshhold of goodness has been reached.  This is often decided as "when an iteration passes and no data points change their cluster assignment status" or on the basis of covering 90% of the data, or whatever.

The bit about calculating the probability that a given data point belongs to a given cluster involves the fundamental assumption that the closer a given point is to a cluster's center, the more likely it is to belong to that cluster.  In practice, you use a gaussian distribution curve that uses the cluster's R value to establish its one-standard-deviation radius as the actual formula for the probability, although I believe that any monotonically declining function (e.g. a linear decay that reaches zero at radius R, 1/R, or anything else like that) should work too.  Gaussian distributions are usually used for cluster analysis because real world data often fits gaussian distributions around the "true" center.

The other wrinkle, which you've probably figured out already, is "how do you know, a-priori, how many clusters to look for?"  There are a couple of ways.  If you don't mind user input, you can just have the user click on perceived cluster centers, then click a "start" button when they've identified them all.  If you want an automatic solution, then usually the above algorithm is tweaked in such a way that R is constrained not to get bigger than some maximum value.  When the algorithm terminates (because no points changed ownership from one iteration to the next), you check to see if enough of the data was covered for you to call it a good solution.  If not, you increase the number of clusters by one and start over.  So more or less you ask "can you cover the data with 1 cluster that's no bigger than R=(whatever)?" And the algorithm comes back and says "Well, here's the best 1 cluster I can find, but only 75% of the data is covered," and you come back and say "no, not good enough.  How about 2 clusters?"  And you go back and forth until you hit your coverage goal.
Re: split array of points into local neighbourhood
Reply #4 - Feb 9th, 2009, 10:40pm
 
Thanks cloister. I still consider myself to be a bit of a noob to programming so I will need a bit of time to fully digest all that your solution entails. I didn't realise it would be such an involved process.

One thing I should make clear is that I am not really concerned so much with a circle of {P,R} since I am planning on feeding the new Arraylists to convex hull function. And I think the other difference in what I had imagined is that I wasn't so much concerned with finding centres of concentration ( as I am reading your solution to provide, though I may be misunderstanding) so much as a group of points where each point has at least one other neighbour within the given distance. Now that I think of it, I guess you might imagine a delaunay triangulation that would be severed at any edge over a certain length. I could already actually create such a thing, using Lee Byron's library (see below) and only rendering those links under a certain distance...I can do that - But I need it to be structural AKA divvying up into separate lists instead of just decoration....

http://leebyron.com/else/mesh/#newvoronoi

But I will chew on what you offered up...maybe it holds the answer

thanks again!
Re: split array of points into local neighbourhood
Reply #5 - Feb 10th, 2009, 2:11am
 
It sounds like you may have more stringent mathematical restrictions on your solutions, in which case the type of cluster finding that I described may not be appropriate.

The algorithm I outlined is good for situations where you have a who lot of points that fit some hidden distribution and you're trying to recover a more rigorous description of that hidden distribution from the data.  Or in other words, "these points were distributed according to some function f(); use the points to estimate what f() was in the first place."

It's great if you have some a-priori belief that the location of the points fits a model of statistically random scattering around one or more fixed centers.

If that's not your situation, then this algorithm may be essentially meaningless.

Either way, I did code it up today just for fun, so if you want to see the code let me know and I'll post it.
Re: split array of points into local neighbourhood
Reply #6 - Feb 10th, 2009, 5:28am
 
cloister, yes! I mean if you've already looked into it...it might be worth giving it a go. If you don't mind Smiley

There is nothing hidden exactly that I am trying to reveal. The whole thing, including my dataset which is partially driving the location of the points is not rigorous or scientific in any way. What I was after was the ability to map zones of occupation - but what I am mapping are particles which are doing their thing all over the screen...I wouldn't say that it is particularly meaningful or that there is some hidden pattern. This is all in the preparation for a landscape/architectural design...each of my particles might be thought of as spaces (or the center's of these spaces, carriers also of data)...once I instantiate them I am looking for zones where related spaces are ending up. Instead of drawing convex hulls around all similar particles spread over the entire screen, it would be more useful to speak about neighbourhoods or more discrete zones where these particles tend to end up...

to be demystified about what I have and what I am after check out my applet

http://www.upsdn.ca/processing/animal_particles_06g/applet/

if you hit "2" you can see a dalaunay triangulated mesh of all the particles...if they are beyond a certain distance that link is not drawn - but it's still there. I want to have distinct arraylists to be generated as portrayed by these links - these lists would then be sent to the convex hull function which you can see if you hit "3".

I really thought what I wanted to do was pretty straightforward - just that I didn't have the mind to see it done...I am open to any suggestions, and I'd love to see the solution you already came up with...

thanks again


Re: split array of points into local neighbourhood
Reply #7 - Feb 10th, 2009, 10:13pm
 
Sadly, the code is too long (at least, when it's well commented it is) to post here.  The forum won't let me.  Can I send it to you somewhere?

My other idea (after looking at your applet), is that you probably can do what you want just with some graph analysis.

The problem seems to me that you get these clusters of related animals that, to the eye, are obviously related but happen to be connected by one extra line to some other obvious cluster.  If you can find and eliminate those extra lines, then the clusters become well separated and are easy to find.

Those extra, single-line connections can be identified (and thus ignored) by determining whether the animals on either end of the connection share any immediate neighbors in the triangulation.  If they do, then the line stays.  If not, then you eliminate it.

After you eliminate the single-line connections, the rest of the graph breaks into clearly distinct clusters, plus single-animal outliers.  To identify the clusters, you would proceed as follows (and in this whole process, you only consider the connections in the triangulation that remain after doing the above stuff):

1. Mark all the animals as "unexamined"
2. Pick an unexamined animal
3. How many connections does it have?
4. if zero, then this animal is an outlier.  Mark it as examined and go to step 6.
5. otherwise, call a recursive graph-walking function to find the set of animals that are properly connected to this one.
6. continue from step 2 so long as there are any unexamined animals.

The recursive graph walking function would take an animal and an arraylist as parameters, and do the following:
1. Add the animal to the arraylist
2. Mark the animal as examined.
3. loop over each of the input animal's connections
4. if the animal on the other end of that connection has NOT been examined, then call the recursive function with that animal and the same list as parameters.

When the recursive function finally exits, the arraylist will contain all the animals that were in that group, and all of them will have been marked as "examined"
Page Index Toggle Pages: 1