We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › Reading HTML Text Content
Page Index Toggle Pages: 1
Reading HTML Text Content (Read 606 times)
Reading HTML Text Content
Dec 3rd, 2007, 2:03pm
 
Hi processing pros!
I am a newbie and I try to read out some text from a html website.

The website is for e.g.:
http://www.sternmotor.de/test.htm

Iam using proHTML to read it out and it works fine so far, but:
After reading out the text I need to seperate the content by the already on the website given commas. But proHTML just cuts off the commas and also special characters as ü, ä and ö.
So it seperates the text just word by word, and not comma by comma.

Question:
Does anybondy can give me a hint how to read out the complete text with all commas and characters?

Thanx!
Re: Reading HTML Text Content
Reply #1 - Dec 4th, 2007, 9:20am
 
hi "hector"! Wink

something like this?
Code:

String[] lines = loadStrings( "http://www.sternmotor.de/test.htm" );

String html_source = join( lines, "\n" );

int index1 = html_source.indexOf( "<body" ) + 5; // 5 = length of "<body"
int index2 = html_source.indexOf( ">", index1 ) + 1;
int index3 = html_source.indexOf( "<!-- Render time:", index2 );

String txt = html_source.substring( index2, index3 );
txt = trim(txt);

txt = join( split(txt, "<br>"), "\n" ); // replace "<br>" with newlines

println( txt );

println( split( txt, "," ) );


best
F
Re: Reading HTML Text Content
Reply #2 - Dec 4th, 2007, 11:04am
 
O man, it works perfectly!
Thank you so much!

Double-IP-Hector
Page Index Toggle Pages: 1