I'm new to working with Processing and have been having a hard time getting ProHTML to parse my data. The examples are all pretty simple, and while I can get them to work great untouched, I can't find any examples of how to deal with elements that are embedded pretty far (and I can't figure out at all how to just get the actual content, which proHTML claims to be able to do). I'm not even sure which class I want to use, as I've gone through nearly all of them at this point and nothing's jumped out at me as being ideal for extracting actual content.
My situation, in short: there's a table, with a column of cells named "dateCell". I want the content inside those dateCells. But that's buried in body>div>div>div>table>tbody>tr>td.datecell, and I just get a whole lot of null.
The farthest I've gotten is that when I run the HtmlElementFinder example my own page I want to parse plugged in, it works great for finding me all the links. But when I try tweaking it (below) to get it to give me a table cell that's like 20 levels in, I just get a whole lotta null.
import prohtml.*;
HtmlElementFinder htmlElementFinder;
void setup(){
//enter your url here
htmlElementFinder = new HtmlElementFinder("my-url-is-here.html","td");
java.util.List links = htmlElementFinder.getElements();
for (int i = 0;i<links.size();i++){
println(((StandAloneElement)links.get(i)).getAttribute("td.dateCell"));
}
}
I know I'm probably not even using the right class, but I'm having a hard time figuring out which to use. As an alternative to proHTML, I've tried parsing in python, since I'm not all that picky and frankly am thinking that it makes more sense for python to do the heavy lifting here. So if someone has a script I can just drop in that'd be great too.
I've also tried using XMLElement (which I've used before successfully) but that was a no-go.