We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpOther Libraries › ProHtml help please
Page Index Toggle Pages: 1
ProHtml help please (Read 1798 times)
ProHtml help please
Jan 12th, 2007, 3:58am
 
I am trying to use the htmlList command to extract live data from a website.

import prohtml.*;

HtmlList htmlList;

void setup(){
 size(100,100);
 //enter your url here
 htmlList = new HtmlList("http://www.yoururl.com");

 for (int i = 0;i<htmlList.pageList.size();i++){
   println(htmlList.pageList.get(i));
 }
}

However, anything within an element that has -'s or .'s has them removed, eg.
-5.0    becomes    50

Is there anyway I can list everything on the page without this filtering?

Thanks
Re: ProHtml help please
Reply #1 - Sep 19th, 2007, 6:54pm
 
I'm trying to do the same thing. Did anyone ever answer your question?? Did you find another solution?
Re: ProHtml help please
Reply #2 - Sep 19th, 2007, 11:52pm
 
Unfortunatly not.  I just ended up making it assume all results were negative, which gave me the occasional false reading. Luckly my audience were none the wiser.
Perhaps this should be brought to the attention to the author of the proHtml library.
Re: ProHtml help please
Reply #3 - Sep 20th, 2007, 12:14am
 
I find that the problem with proHTML is that although the documentation is thorough, the examples are really very few and are limited to just the basics. Also, I haven't found much other code out there where people are using it. I tend to learn by example, so it has been tricky for me to figure out how to do even basic things, considering I am at best, average at programming.

I was trying to grab data values from a web page and would like to just do this:

1. GET url
2. Locate a target text string e.g., "Temperature:"
3. Grab an adjacent data value and stuff it into a variable

Seems like a pretty trivial task, really, but I haven't figured out how to do it with proHTML.

Any thoughts?
Re: ProHtml help please
Reply #4 - Sep 20th, 2007, 4:49pm
 
@facade can you give me a link to the page that you have parsed and you program, you can me both by mail than I will try to check this.

@thatbrock
you are right maybe I should include some more advanced examples, so far in my experience HTMLList works best. First you parse your document into a HTMLList and than you print the elements and their index. Now you search for the indices of the elements you are looking for and pass the values to the variables. This works good if the structure is always the same, I used it to read out data from multiple documents.

Anotherway is to use the HTMLElementFinder and search for the ElementType that contains your values. I already have a modified version where you can also add arguments and values to search for certain ids for example. The Problem is that I use java 1.5 in the new version so it won't work with processing, but I can include it if you need it.
Re: ProHtml help please
Reply #5 - Sep 20th, 2007, 5:33pm
 
tex,

If you would post a code sample here of the HTMLList grabbing some values from a web page, that would be great. Then others could also benefit from it.

I am able to get HTMLList to list the elements, but I can't figure out how to use it to return a value for a specific element. It's also a problem if the structure changes slightly.

Do you have an example of code that will do what I suggested in my previous post? For example:

1. GET url (i.e.:   htmlList = new HtmlList("http://www.weather.com");

2. Iterate through HTMLList to find a match a text string known to be on the target page, e.g., "Temperature:"

3. Grab an adjacent data value (i.e., matched text +2 extra characters) and stuff it into a variable snip out the characters after the matched string...

Thanks!
-B
Re: ProHtml help please
Reply #6 - Oct 9th, 2007, 7:20pm
 
Yes it would be great if you can post an example of parsing from a website with prohtml !
thanks
Re: ProHtml help please
Reply #7 - Oct 10th, 2007, 12:36am
 
I have found a good way of parsing data from a website that does not use proHtml at all.  One advantage of not using an external library is that it works well as a java applet for a web-page.  Here is the code, which any and all are welcome to use.

// an example for data from www.weather.com
// by Dan de Waal, 10 Oct '07

String URL = "http://www.weather.com/weather/local/USCA0027"; // Anaheim, CA
String tempstring;
int tempnum;
String[] data;

void setup(){
 size(100,100);
 data = loadStrings(URL);  //I found this to be much better than HtmlList from proHtml.
 
 for (int i=0; i<data.length; i++){
   int t = data[i].indexOf("<B CLASS=obsTempTextA>");  //temporary variable to hold location of search string within each line. -1 indicates search string not found.
   if(t > -1){ //search string found.
     //println(i + ": " + data[i]);
     tempstring = data[i].substring(t+22, t+24);  // search string is 22 chars long, so get data after that.
   }
 }
 
 tempnum = int(tempstring); // convert to an integer.
 println(tempnum); //result.
}
Re: ProHtml help please
Reply #8 - Oct 10th, 2007, 4:58am
 
here is the same program using proHtml.
notice the search string has gone from 'obsTempTextA' using loadStrings to 'obstemptexta' using prohtml.

// an example for data from www.weather.com
// by Dan de Waal, 10 Oct '07

import prohtml.*;

HtmlList htmlList;

String URL = "http://www.weather.com/weather/local/USCA0027"; // Anaheim, CA
String tempstring;
int tempnum;

void setup(){
 size(100,100);
 htmlList = new HtmlList(URL);
 
 for (int i=0; i<htmlList.pageList.size(); i++){
   String ts = htmlList.pageList.get(i).toString();
   int t = ts.indexOf("obstemptexta");  //temporary variable to hold location of search string within each line. -1 indicates search string not found.
   if(t > -1){ //search string found.
     tempstring = htmlList.pageList.get(i+1).toString(); // store the next record into tempstring.
   }
 }
 
 tempnum = int(tempstring.substring(0,2)); // convert to an integer.
 println(tempnum); //result.
}
Page Index Toggle Pages: 1