Thanks, MattD. Hopefully you won't be sorry you asked
but here is (1) a brief explanation and (2) some structure from motion resources:
1. Brief intuition: In Processing/OpenGL, you can feed a list of 3D vertices into a camera matrix to get out a bunch of 2D points (the image you see).
What if we do it backwards -- all we have are the 2D points (the image), and we want to figure out the 3D vertices and the camera matrix? This is impossible with only one image, but if we have two or more images of the SAME 3D scene from different viewpoints, there is only one possible configuration of 3D vertices (structure) and camera matrices (motion) that could have produced the images. So, to gloss over many details, we iteratively search for this correct configuration, getting closer and closer to the right answer at each step.
2. It's tough to find a really good introductory explanation of structure from motion techniques, mostly because they jump to the math before the intuition is established, but this document has the gentlest introduction I could find: http://www1.cs.columbia.edu/~jebara/htmlpapers/SFM/sfm.html
A nice recent overview of a whole pipeline is Noah Snavely's original Photo Tourism work at SIGGRAPH: http://phototour.cs.washington.edu/Photo_Tourism.pdf
And here's the textbook if you really want to dive in: http://books.google.com/books?id=si3R3Pfa98QC
I hope all that points you in the right direction. Note that this applet isn't doing the structure from motion at runtime -- it is just displaying the results.
-Grant