how to measure the mean distance from the flock centroid of two classes when the flock splits?

edited February 2014 in How To...

Hi,

I'm a beginner at coding, creating a boids model to simulate parasitized and un-parasitized sheep, aiming to predict the prevalence level at which we can detect parasitized sheep from their position in the flock.

I need to measure the mean distance of my two classes of boids from the flock centroid and record this via saveTable. I've found some instructions for how to do that in Nature of Code and on stackoverflow, but what happens when the flock splits in two or more clusters?

The splitting into clusters is realistic, so I don't want to adjust the flocking to prevent it at all, but it complicates how to measure the mean distance from the flock centroid.

Since the proportion of parasitized class to non-parasitized class will vary, my results could be due to the higher probability of the minority class happening to go with the the majority class by chance rather than because of the different weighting values I input on each class's flocking interaction rules.

There is a 'neighbourhood' variable in the model I'm modifying, representing the organisms' limited sensory fields, so perhaps I could use that rather than the simple flock centroid? Any hints how please?

Thanks v much!

Answers

  • What you're touching on is an entire field in its own right: http://en.wikipedia.org/wiki/Cluster_analysis

    There are several algorithms for figuring out clusters. Which one you choose depends entirely on several things:

    Do you know how many clusters you have ahead of time? Or do you have to figure that out as well? Does an instance "know" which cluster it is in ahead of time? Or can it change depending on the instance's location?

  • Further question:

    If I use the code for finding the flock centroid but apply it to both classes separately, and define a function to calculate the difference, is that conceptually the same thing, or not?

  • I'm really not sure what you're asking. The same thing as what?

    So you have two types, let's call them red and blue. You want to figure out the centroid of the red ones and compare that to the centroid of the blue ones?

    Or do you want to take a look at a mixed group and categorize each member into either red or blue depending on their position?

    Or something else?

  • edited February 2014

    Do you know how many clusters you have ahead of time?

    No

    Or do you have to figure that out as well?

    Yes

    Does an instance "know" which cluster it is in ahead of time, or can it change depending on the instance's location?

    Depends on the instance's location.

    So you have two types, let's call them red and blue. You want to figure out the centroid of the red ones and compare that to the centroid of the blue ones?

    Compare the centroid of the red ones and the centroid of the blue ones to the centroid of all red and blue ones.

    Or, is that the same thing as compare the centroid of the red ones and the centroid of the blue ones?

  • Here's what's confusing me: you have two types of objects, but you want to cluster all of your objects depending on location (not class) into an unknown number (so not always 2) of clusters. It sounds like you might have three clusters, each with objects from both types of object. So it becomes meaningless to compute the centroid of the "flock" (cluster?) of a single type of object, since they can be split into multiple clusters along with the other type of object. Or am I misunderstanding something? Do you have any graphical representations (images) of example flocks and clusters?

  • edited March 2014

    It's confusing me too!

    Here's what I've got so far - Parasitized Sheep Boids on Youtube

    The purpose of identifying the clusters is to measure the distance of each boid from the centroid of its nearest cluster or the cluster it's associated with, in each time step, in order to show that 'parasitized' (slightly higher wandering) boids tend to be on the peripheries of their group, even when the flock splits up into clusters.

    I'm using 'flock' to mean all of them, and 'cluster' for the sub-flocks that form because the wandering and flocking functions are quite evenly balanced. Hopefully the video will make more sense than my description.

    I've found some instructions in Darren Croft, Richard James and Jens Krause, Exploring Animal Social Networks, 2008, pp.22-23, but next looking for the nearest matching function in a library in Processing or an R package I can connect to Processing via Rserve.

    Another student on a related project has GPS tracking data on sheep, which look approximately similar, but obviously the real sheep have variation in speed too, even more wandering, and slower relative to the size of the field most of the time.

  • Okay, I think I understand it a little better now. Let me maybe define a couple terms:

    Object: a particular instance in your scene. It's basically a point. Flock: every object in your scene. Species: each object belongs to either Species A or Species B. Cluster: a group of objects located closely to each other. A cluster can contain both species, and there can be any number of clusters in your scene.

    Now, your goal is to come up with clusters for a particular scene, and then measure the distance of every object to its respective cluster. You then want to compare the average distance of objects in Species A versus Species B.

    All of that being said, I'm not sure which step you're stuck on. Your first step would be to get the objects in your scene wandering around and flocking however you want. Your second step would be to pick out and implement a clustering algorithm. Your third step would be to actually do the calculations of your average.

    If you're asking about tweaking the parameters of a particular approach, these types of algorithms are notoriously finicky when it comes to changing their parameters. In other words, the best way is to just try something and see what happens.

Sign In or Register to comment.