Well, if you've already threshold-ed it so that you only have black and white pixels, that already makes it a step easier.
What you could do is make a 2D iso-surface. Imagine you're looking really, really closely at a group of pixels, so that they appear like a grid of squares, like in the left image here:

Now imagine that each pixel is a node, and there are graph lines connecting each pixel to it's immediate neighbors, as on the right.
Color these graph lines green if the are between two pixel nodes of the same color, and red otherwise.
Now the center points of the red graph lines can be connected in each "square" formed by for adjoining pixels.
This will give you the points you need to draw the outline of the hand for one frame. Repeat this for all the frames, and then add a Z-coordinate based on the frame number.
Five things to note:
1) There is a square that has an ambiguous case - when all four of its graph lines are red, as in the bottom center square. In this case, I suggest you join the corners as I have above, as it's more likely that the black squares are joined, not the white ones (black squares being the solid hand). That's up to you, of course.
2) This is a lot of work, both terms of coding and computing power. You probably will not be able to do this for live video!
3) You might not have to do this for every pixel - you might be satisfied if you sampled the image at a less fine rate.
4) Since you want to draw the black part as solid, you'll need to work out which side of the line is the black half, and draw the appropriate shape in each square too. For an all black-region, this will be a square. For one black point, a triangle. For three, an odd pentagram. For two, either a rectangle or an odd hexagram.
5) You don't have to use the midpoint of each graph line - you can use a point a fixed distance away from the black side (probably a very small distance). This will give you a very tight edge.
Thoughts/questions?