We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexDiscussionGeneral Discussion,  Status › Processing on EC2
Page Index Toggle Pages: 1
Processing on EC2 (Read 5032 times)
Processing on EC2
Jan 19th, 2010, 5:45pm
 
First off, if you have a little extra cash lying around, this is a very fun thing to do. I had been meaning to try this, but hadn't gotten around to it until now. I searched around and I couldn't find anyone talking about this, so I thought I would share what I did this weekend. I wanted to render one frame of 720p per line of data in a file I had of 10k lines. To increase my options, I saved each frame as a png, which even on my mini9's SSD, it took a good second to save each frame. So that's three hours for 5 minutes of 30fps video. This seemed like a good thing for EC2 and parallelization, so in a nutshell, this was no big deal, and worked pretty well. Someone else is probably doing this too?

1. Start, say, 20 EC2 nodes
2. Log into each
3. Upload your exported .JAR
4. Split your input file amongst the servers
5. Run the app (it will exit() )
6. tar & gzip the output and download it.

So, on 20 machines, it took less than a couple of minutes. I even screwed it up, uploaded a new jar, re-ran it, and was out of there in a half hour. I had played around and used 66 hours of computing in under a couple of hours, and it only cost $6. Now, granted it took me a whole day to figure out the whole EC2 thing wholly because while I had read about it, I hadn't used it, although I had previously messed with MTurk, so the AWS is a lot of fun Tongue

So, this is one way to do it. I could imagine other scenarios such as Hadoop, or using network protocols between the servers, trying all of this on Google's AppEngine (not sure, one could use ToxicLibs perhaps?) ... and this will definitely speed up things, or allow you to think about rendering ever crazier things. There are other companies like Slicehost and Rackspace cloud, YMMV.

It will have to be parallelizable, however, which is not a trivial thing, like for an ongoing blurring or something, but I would love to hear ideas about that. I also tried this as a regular Java program in NetBeans, but now that I have played with it more, that was nice to figure out, but not necessarily required, although it is indeed great to expand these things out if they get complex.

I don't have the output really to show as I am still working on it, but the preliminary results are astounding. Here, however, is a dump of some Python code to start and stop apps, upload files, split a data file amongst servers, download files with the server name as a prefix, and a unified ssh client to connect to all of your running instances at once (I would like this to use multiple threads, or generators otherwise to speed up the initial login... but here you go...)

gist.github.com/281504

Comments, thoughts, ideas, greatly appreciated.
Re: Processing on EC2
Reply #1 - Jan 20th, 2010, 12:28am
 
How do you figure it was 66 hours worth of computing?

10k lines, about a second per line:
10000 / 60 (seconds) / 60 (mins) = 2.8 hours

Still, sounds like an interesting project. Maybe useful for someone who wants to render something using p4sunflow, for example.

-spxl
Re: Processing on EC2
Reply #2 - Jan 20th, 2010, 1:52am
 
th0m wrote on Jan 19th, 2010, 5:45pm:
[...] mini9's SSD [...] EC2 [...] MTurk, so the AWS [...]

I believe I am quite up to date of computing news, but most of your post is... Turk (or Chinese, Hebrew, whatever language I don't grok) for me.

A vague reminiscence triggered by the "cloud" word suggests EC2 is a cloud computing service (Amazon Google) but providing a bit more context would help people to understand your experiment, which seems interesting.

Thanks for sharing.
Re: Processing on EC2
Reply #3 - Jan 20th, 2010, 9:37am
 
subpixel -

Yes, I glossed over the numbers. My main thing was to use a hard number, so I looked at my bill, which says 66 cpu hours, for $6. This represents a day of me playing with one or two boxes getting familiar with things for maybe 5~6 hours maybe more, and then it also includes me running the block of 20 boxes for possibily a total of 3 hours in that state. The specific example I cite ran 20 machines at about 1 hour, so at $0.11 per CPU (i can't remember I might have been at $0.085/hr) it ran the equivient of running 1 machine at 20 hours, and the total price of that was probably like $2.20 (20 * $0.11) ... I also ran some other jobs, and then the aforementioned time figuring all of this out represents where the rest of my time and money went, and I don't have a specific calculation, because I did a lot of things like playing with different disk images (I settled on the basic Fedora image) and also then installing Java, installing Xvfb for running Processing headless, installing fonts into X, making input and output directories.


philho -

Yes, I apologize. I thought I knew a lot about computers before I dabbled in GIS and the Semantic Web, and these worlds are almost their own complete seperate language. Mechanical Turk is another Amazon service, but instead of using scripting to divide up time of virtual machines, you try to divide up a task amongst real people that you play in small increments per task. EC2, the service I used with virtual machines, plus MTurk, plus a distributed storage system that backs a lot of your web browsing, make up what Amazon calls their "Web Services" so... AWS for short.

The Dell Inspirion 910 (aka the Mini 9) is a small "netbook" computer which at times can barely watch a YouTube video it seems, but is great for light web browsing and terminal screens which is all you need to work with this stuff. It has a "Solid State Disk" which I guess you can think of as a beefed up flash drive, and are rather expensive in price per megabyte, but offer no moving parts, and hence also no seek time, which can be a concern in larger situations where you are keenly observing the times of your input/output operations on a machine (or between several). The jury is out on their use right now, but this Dell thing has one.

I also had no idea of the context of whom might read, so I tried to be broad and specific at the same time, which maybe doesn't work. The example of doing this draws (somewhat in a bare-bones way) from the basic concepts of parellizing a task on multiple processors (like threading, or multicore programming like Apple's Grand Central or general-purpose graphics processing unit computing - GPGPU), or doing so with disparate machines or virtual machines that may come and go (like Facebook, Twitter, Google, Apache Hadoop, or basically the concept of Map / Reduce ... there is a good video a part of Stanford's video collection about MapReduce). In a nutshell if you have a process that has parts that could run in parallel, you "Map" the data to various nodes, sort the processing of it, further then "Reduce" the outputs of those mapped data-to-machine paths, in groups that make sense. Mostly this is all just a "Map" and possibly a sort, and then I just download all the data, and "Reduce" it by renaming the output images, or rendering it into MPEG-4 with ffmpeg.

Hopefully that clears some things up?
Re: Processing on EC2
Reply #4 - Feb 1st, 2010, 11:56am
 
Here's the unfortunate outcome of my experiment.

http://vimeo.com/9123429

I say that because I threw this together quickly with the new OpenShot video editor, and things don't line up, as I might be missing a way to be more precise than what that editor seems to currently support.

This is a series of still-frame Processing sketches, but then iterated over the data of some 10,540 threads posted on a message board (essentially, the whole of 2009's activity).

Where EC2 really helped was to have an idea and not have to wait on the image sequence to render. So, while this thing as a whole doesn't seem like much, consider it as the result of iterative, ever building analysis.

Also, subpixel --- thank you for namesake. I wanted to think of a way to show *all* of the 2500+ users all acting at once, and a horizontal waterfall like histogram / spectrum analysis but using the individual r,g,b was I guess one way to do it, so thanks for chiming in, or else it wouldn't have occurred to me to use that with this Tongue
Re: Processing on EC2
Reply #5 - Feb 1st, 2010, 8:00pm
 
well, 5 months ago:
http://vimeo.com/6299622

much better and interesting use
Page Index Toggle Pages: 1