Processing 1.0 - Processing Discourse - Reading HTML Text Content

We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.

Index › Programming Questions & Help › Syntax Questions › Reading HTML Text Content

‹ Previous Topic | Next Topic ›

Pages: 1

Reading HTML Text Content (Read 606 times)

hector

Reading HTML Text Content
Dec 3^rd, 2007, 2:03pm

Hi processing pros!
I am a newbie and I try to read out some text from a html website.

The website is for e.g.:
http://www.sternmotor.de/test.htm

Iam using proHTML to read it out and it works fine so far, but:
After reading out the text I need to seperate the content by the already on the website given commas. But proHTML just cuts off the commas and also special characters as ü, ä and ö.
So it seperates the text just word by word, and not comma by comma.

Question:
Does anybondy can give me a hint how to read out the complete text with all commas and characters?

Thanx!

fjen

Re: Reading HTML Text Content
Reply #1 - Dec 4^th, 2007, 9:20am

hi "hector"! Wink

something like this?
Code:


String[] lines = loadStrings( "http://www.sternmotor.de/test.htm" );

String html_source = join( lines, "\n" );

int index1 = html_source.indexOf( "<body" ) + 5; // 5 = length of "<body"
int index2 = html_source.indexOf( ">", index1 ) + 1;
int index3 = html_source.indexOf( "<!-- Render time:", index2 );

String txt = html_source.substring( index2, index3 );
txt = trim(txt);

txt = join( split(txt, "<br>"), "\n" ); // replace "<br>" with newlines

println( txt );

println( split( txt, "," ) );

best
F

hector YaBB Newbies Offline Posts: 25	Re: Reading HTML Text Content Reply #2 - Dec 4^th, 2007, 11:04am O man, it works perfectly! Thank you so much! Double-IP-Hector

Pages: 1

‹ Previous Topic | Next Topic ›