[multiple kinects] area sensor

hi all

this is a request that's in planning mode yet. i have to cover an area of about 10x10m, there will be a crowd and i need to get most of the CenterOfMass(COM) x,y coordinates.

is it possible to do it with 3 or 4 kinects?



  • Center of mass might be hard to do with just depth data. Consider a room with two people in it holding two cardboard boxes of similar size. One box is empty. The other contains lead weights. To any amount of Kinects, both people and boxes will appear the same... So how could you determine that the COM is closer to the box of lead weights?

    Would you settle for center of volume instead?

  • @TfGuy44 Pixels have mass. They each has a mass equivalent of one unit. Good news is that is not dependent on color and it will make more sense if you work with a slice section of your depth data.

    My question to you @eduzal, why four kinects? Are you planning to keep track of each subject? COM will depend on the perspective of each camera and their reference frame. For instance, one camera sees the character in the edge. You get COM close to the edge. A second camera sees the same character at the center. How do both relate? Are you mapping each camera position against each other?

    Also, how would you manage the case when one camera sees 3 characters while another sees only two as one of them is not visible by this camera. I am just trying to understand how using the streaming of 4 cameras could be use to estimate the COM, so yes, thinking out loud.


  • edited January 2018

    Is the crowd standing on flat terrain, with no furniture?

    Is there a ceiling that you can mount cameras on? If it is open space, can you mount bird's eye, like on tall poles?

    Due to the nature of standing human bodies and crowds, one top-down camera will give you more and better individual center estimates on a crowd of people than triangulating a ring of horizontal cameras -- assuming you only need centers, and not silhouettes. To cover more space, align each new top-down camera frustums to a different square ~6ft off the ground. Depending on your Kinect there are different frustum calculation tools. https://smeenk.com/kinect-field-of-view-comparison/


    Each figure is a bump on the depth image (their head). You can approximate the center of each figure's body by finding the center of the head, then finding the midpoint between the bump center and the known distance to the ground. This technique doesn't work on sunbathers (head not upright) or olympic sprinters (body not under head), and it could have problems if people stick their hands in the air (fake heads) or carry umbrellas (huge tall head, high fake center). But for a simple crowd of people walking, you don't need anything more fancy.

  • thank you all @TfGuy44 @kfrajer @jeremydouglass for your answers.

    the situation is one more closely as the one described by jeremydouglass. it will be an empty 10x10m flat room, i don't know yet the height but it's about 5m to the ceiling grid, with people walking through it. if people are carrying boxes, bags or whatever is irrelevant. what i'm looking for as input is their x,y coordinates in the room.

    i have done something like this a few years ago with 2 IR cameras positioned at the ceiling about 4 mts high, processed them with CCV, sent each blob centroid via TUIO to a processing sketch, did my processing thing and sent resulting image via syphon to resolume(two projectors with soft edge blending). https://instagram.com/p/f_M0OZJSua/?taken-by=eduzal

    CCV has become abandoned ground, it's last update from 2011, and i'm looking for any solution that has those built in possibilities, like the kinect. i've found no literature for stitching kinects together or networking them, like make them work together to cover a given volume. i've found something that deals with mocap or 3D scanning but this is beyond my needs.

    the mission i have now is to create a sound instalation in this given room, it intends to be a quadrophonic system with one speaker in each corner. i intend to manipulate recorded samples according to the position and amount of people in the room, like a reactable, and i'm looking for a solution for creating this input system.

    thank you for your contributions.

  • Great! It sounds like you don't even need their center of body based on height and distance to the floor -- you really just need the tops of their heads, so just return the kinect map and find the local maxima.

    Re:stitching kintects -- once you have a multi-kinect setup ( https://github.com/shiffman/OpenKinect-for-Processing/issues/17 ), perhaps you could draw scaled down connect depth maps (e.g. 2x2) into one PGraphics frame, then take the local maxima of that image to get your point array.

    Possibly relevant:

  • i started to think about COM cause is didn't wanted to spend long time in a scaffold or stair placing the kinects on the roof, the first intention was to place the kinects close to the speakers.

    @jeremydouglass you wrote about local maxima. i didn't get it.

  • I just meant that the stitched kinect depth maps are like topography, so the list of heads (the list of highest points) is a list of local maxima:

    You can do this in a lot of ways -- blob detection etc. It sounds like you have past experience turning a top-down image into a list of centroids.

  • yes. centroids are all i need.

Sign In or Register to comment.