FAQ
Cover
This is the archive Discourse for the Processing (ALPHA) software.
Please visit the new Processing forum for current information.

   Processing 1.0 _ALPHA_
   Programming Questions & Help
   Integration
(Moderators: fry, REAS)
   Can I extract text from a web site?
« Previous topic | Next topic »

Pages: 1 
   Author  Topic: Can I extract text from a web site?  (Read 2926 times)
nicholasjones


Can I extract text from a web site?
« on: Jan 25th, 2005, 11:19am »

I would like to make a Processing app that can copy text from a web site and then present it on the stage with the font/size/colour that i select.  
 
is this possible in processing? i have looked around, but ont found any guides.
 
thanks.
 

....by_nic
eskimoblood

222550793222550793 WWW
Re: Can I extract text from a web site?
« Reply #1 on: Jan 25th, 2005, 11:57am »

You can access every part of a html document by the netscape.javascript class. Take a look at this:
http://www.glacier.rice.edu/~mhardy/jsref/pkg1.htm
 
 
 
nicholasjones


Re: Can I extract text from a web site?
« Reply #2 on: Jan 25th, 2005, 12:00pm »

thanks a bunch eskimo, i'll take a bash and post back!
 

....by_nic
TomC

WWW
Re: Can I extract text from a web site?
« Reply #3 on: Jan 25th, 2005, 12:47pm »

I think the javascript class will only let you access the text of the page which contains the applet.
 
Processing can use loadStrings (and loadImage, etc) to access a file from the same domain as the applet was served from.
 
If you want to access remote sites (e.g. news headlines, images, etc.) you will either need a signed applet (which is tricky) or to use a scripting language such as PHP to redirect your request.
 
nicholasjones


Re: Can I extract text from a web site?
« Reply #4 on: Jan 27th, 2005, 2:09pm »

thanks TomC, but what do you mean by using PHP to re-direct the request? would this be a fast process? fast as in running time, not the time to build it though...
 
thanks again.
 

....by_nic
TomC

WWW
Re: Can I extract text from a web site?
« Reply #5 on: Jan 27th, 2005, 3:18pm »

There's some more discussion of these issues here... including a really minimal PHP script by toxi which will do what you need...
 
http://processing.org/discourse/yabb/board_Syntax_action_displa_y_num_1103112579.html
 
I would personally limit what you allow the script to fetch, as it could be open to abuse otherwise.
 
st33d

WWW Email
Re: Can I extract text from a web site?
« Reply #6 on: Jan 27th, 2005, 6:05pm »

This is as idiot proof as I could make it. PC certification of Java applets:
 
1: Download JDK and install.
 
2: Navigate to the bin directory (with me it's on my D: partition under \Program Files\Java\jdk1.5.0\bin)
 
3: It's a good idea at this point to make a shortcut of this directory and put it in your sketchbook folder.
 
4: Export your applet and copy the ".jar" file. Make sure the applet name is only 8 characters or less. It makes what comes next a little easier.
 
5: Open the DOS prompt. You do this by going to your Start bar and click the button named "Run". You can start the promt with the line "command" but I had trouble with long file names with that version (that's why you should keep the applet names short). So start up the DOS prompt with cmd.exe to make navigation simpler.
 
6: Navigate in DOS to your bin directory. You navigate in DOS with the command "cd" followed by the directory. "cd .." takes you up a directory. If you want to change partition simply enter the letter followed by a colon (no need for cd, eg: "c:"). You type in "dir" to look at the current directory contents. DOS is funny about long file names but you can get around this by stating the directory in quotes after cd (eg: cd "My Documents") you don't even have to use the last quotation mark. Even faster is to open a window of the directory you want to go to in your explorer window and drag the folder icon by the address to the DOS window. You can press shift+up arrow to call up what you last typed in if DOS gets a bit finicky with you. (I have to navigate to d: then drag the file name to DOS, put "cd" before it and edit off d: at the beginning to get any love.)
 
7: Once you're in the bin directory you should have the two programs you need (keytool.exe and jarsigner.exe).
 
http://java.sun.com/j2se/1.4.2/docs/tooldocs/solaris/keytool.html
 
http://java.sun.com/j2se/1.4.2/docs/tooldocs/solaris/jarsigner.html
 
From what I've seen other people do in DOS you can type in the name of the program followed by the operations you want that program to perform. Now this is the confusing part. One tutorial told me to type in "keytool -keygen".
 
http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-app let.html
 
If you type in the wrong command you'll get a list of viable commands and the correct syntax to call them. On mine with the latest JDK it's:
"keytool -genkey"
Sometimes it lets me use the first name entry as an alias, sometimes it doesn't.
 
8: So to be safe type in this:
"keytool -genkey -alias myalias"
You'll be asked for a password. Make it simple because you can always delete the ".keystore" file in the bin directory (or Documents and Settings - do a search for the file if it's being a pain) and start again once you familiarised yourself. I've kept the second password the same because self-certification really hates me.
 
9: Now self-certify your java applet.
"keytool -selfcert -alias myalias"
Normally you would get a proper certificate saying you're not evil and pony up some dough but this allows you to offer people to try your risky software.
 
10: Now type:
"jarsigner myjar.jar myalias"
 
Now you can copy your signed java applet back to your applet folder. Type in "exit" to close the DOS prompt.
 
If anyone can amend this to make it simpler or correct please do. And try not to publish anything harmful.
 

I could murder a pint.
Pages: 1 

« Previous topic | Next topic »