First off, if you have a little extra cash lying around, this is a very fun thing to do. I had been meaning to try this, but hadn't gotten around to it until now. I searched around and I couldn't find anyone talking about this, so I thought I would share what I did this weekend. I wanted to render one frame of 720p per line of data in a file I had of 10k lines. To increase my options, I saved each frame as a png, which even on my mini9's SSD, it took a good second to save each frame. So that's three hours for 5 minutes of 30fps video. This seemed like a good thing for EC2 and parallelization, so in a nutshell, this was no big deal, and worked pretty well. Someone else is probably doing this too?
1. Start, say, 20 EC2 nodes
2. Log into each
3. Upload your exported .JAR
4. Split your input file amongst the servers
5. Run the app (it will exit() )
6. tar & gzip the output and download it.
So, on 20 machines, it took less than a couple of minutes. I even screwed it up, uploaded a new jar, re-ran it, and was out of there in a half hour. I had played around and used 66 hours of computing in under a couple of hours, and it only cost $6. Now, granted it took me a whole day to figure out the whole EC2 thing wholly because while I had read about it, I hadn't used it, although I had previously messed with MTurk, so the AWS is a lot of fun
So, this is one way to do it. I could imagine other scenarios such as Hadoop, or using network protocols between the servers, trying all of this on Google's AppEngine (not sure, one could use ToxicLibs perhaps?) ... and this will definitely speed up things, or allow you to think about rendering ever crazier things. There are other companies like Slicehost and Rackspace cloud, YMMV.
It will have to be parallelizable, however, which is not a trivial thing, like for an ongoing blurring or something, but I would love to hear ideas about that. I also tried this as a regular Java program in NetBeans, but now that I have played with it more, that was nice to figure out, but not necessarily required, although it is indeed great to expand these things out if they get complex.
I don't have the output really to show as I am still working on it, but the preliminary results are astounding. Here, however, is a dump of some Python code to start and stop apps, upload files, split a data file amongst servers, download files with the server name as a prefix, and a unified ssh client to connect to all of your running instances at once (I would like this to use multiple threads, or generators otherwise to speed up the initial login... but here you go...)
gist.github.com/281504
Comments, thoughts, ideas, greatly appreciated.