Internet data scraping - easy or hard with Processing?

I'm just taking a look at Processing. I'm most interested in dynamic data visualization for social change. Can processing go out on the web and scrape or mash site-data easily? I took a quick look at the libraries but didn't see much for that. I don't want to have to learn yet another language (or relearn one) to do the data scraping. I know some Python, but I don't want to be switching back and forth.

Answers

  • Take a look at the Processing example "XMLYahooWeather". Maybe it will spark interest. Here:

    /**
     * Loading XML Data
     * by Daniel Shiffman.  
     * 
     * This example demonstrates how to use loadXML()
     * to retrieve data from an XML document via a URL
     */
    
    // We're going to store the temperature
    int temperature = 0;
    // We're going to store text about the weather
    String weather = "";
    
    // The zip code we'll check for
    String zip = "10003";
    
    PFont font;
    
    void setup() {
      size(600, 360);
    
      font = createFont("Merriweather-Light.ttf", 28);
      textFont(font);
    
      // The URL for the XML document
      String url = "http://xml.weather.yahoo.com/forecastrss?p=" + zip;
    
      // Load the XML document
      XML xml = loadXML(url);
    
      // Grab the element we want
      XML forecast = xml.getChild("channel/item/yweather:forecast");
    
      // Get the attributes we want
      temperature = forecast.getInt("high");
      weather = forecast.getString("text");
    }
    
    void draw() {
      background(255);
      fill(0);
    
      // Display all the stuff we want to display
      text("Zip code: " + zip, width*0.15, height*0.33);
      text("Today’s high: " + temperature, width*0.15, height*0.5);
      text("Forecast: " + weather, width*0.15, height*0.66);
    
    }
    
  • edited November 2013

    Parsing XML is a bit different than parsing HTML (unless that's XHTML, of course).

    For the latter, you can take a look at the jsoup Java library.

  • I would definitely recommend python for the scraping part, especially this module

    Just dump the data you need to json or xml for that matter, and read this with processing.

  • i ve done this a couple of times and ,as PhiLo mentioned, jsoup is the way to do it (or any other similar library that suits you)..

    bear in mind that if you want to collect data from a website that visits multiple pages you have to wait at least 5 seconds before you go to the next one , info about how you can parse data are stored in the robots.txt file in each website ..

    Moreover, i think that this kind of visualization requires threads

Sign In or Register to comment.