We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpOther Libraries › basic help with proHTML
Page Index Toggle Pages: 1
basic help with proHTML (Read 581 times)
basic help with proHTML
Apr 16th, 2009, 11:28pm
 
I'm new to working with Processing and have been having a hard time getting ProHTML to parse my data. The examples are all pretty simple, and while I can get them to work great untouched, I can't find any examples of how to deal with elements that are embedded pretty far (and I can't figure out at all how to just get the actual content, which proHTML claims to be able to do). I'm not even sure which class I want to use, as I've gone through nearly all of them at this point and nothing's jumped out at me as being ideal for extracting actual content.

My situation, in short: there's a table, with a column of cells named "dateCell". I want the content inside those dateCells. But that's buried in body>div>div>div>table>tbody>tr>td.datecell, and I just get a whole lot of null.

The farthest I've gotten is that when I run the HtmlElementFinder example my own page I want to parse plugged in, it works great for finding me all the links. But when I try tweaking it (below) to get it to give me a table cell that's like 20 levels in, I just get a whole lotta null.

import prohtml.*;

HtmlElementFinder htmlElementFinder;

void setup(){
 //enter your url here
 htmlElementFinder = new HtmlElementFinder("my-url-is-here.html","td");

 java.util.List links = htmlElementFinder.getElements();

 for (int i = 0;i<links.size();i++){
   println(((StandAloneElement)links.get(i)).getAttribute("td.dateCell"));
 }
}

I know I'm probably not even using the right class, but I'm having a hard time figuring out which to use. As an alternative to proHTML, I've tried parsing in python, since I'm not all that picky and frankly am thinking that it makes more sense for python to do the heavy lifting here. So if someone has a script I can just drop in that'd be great too. Cheesy  I've also tried using XMLElement (which I've used before successfully) but that was a no-go.
Page Index Toggle Pages: 1