We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpOther Libraries › Its possible to read Online-HTML-Code
Page Index Toggle Pages: 1
Its possible to read Online-HTML-Code? (Read 943 times)
Its possible to read Online-HTML-Code?
Sep 27th, 2009, 5:13am
 
Hey,

im just a newbie in processing. Because of that i would be happy for getting help Smiley

I want to read or get the html code of a page...i want to search in this code for a picture, which i want to load in processing....

But how its possible to read or get the html code of a page.....I just want to skip the the step to use firefox and the posiibility to see the code....

in C# you can use the "WebClient" and the "DownloadString" and in processing? Cheesy

Thanks Smiley

Re: Its possible to read Online-HTML-Code?
Reply #1 - Sep 27th, 2009, 7:08am
 
do you have special website, special image you want to see ? if not there are several apis that makes it easy to search for images on flickr, or google for example.  

Or maybe take a look at prohtml http://creativecomputing.cc/p5libs/prohtml/ there you can interate over the html tree structure and look for images. Shouldnt be to hard to extract the image path then.

Re: Its possible to read Online-HTML-Code?
Reply #2 - Sep 27th, 2009, 7:49am
 
Hiya

I made a class I think can do what you want.

Code:

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class HtmlPictureReader {

public HtmlPictureReader() {
}

public Document readHtmlDoc(String url) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(read("http://somesite.com/somepage.html"));
doc.getDocumentElement().normalize();
return doc;
}
catch(Exception e) {
}
return null;
}

public InputStream read(String filename) {
try {
return new FileInputStream(new File(filename));
}
catch (Exception e) {
}
return null;
}

public ArrayList<String> getPictures(Document doc) {
ArrayList<String> pictures = new ArrayList<String>();

NodeList[] imageTags = new NodeList[3];

imageTags[0] = doc.getElementsByTagName("img");
imageTags[1] = doc.getElementsByTagName("IMG");
imageTags[2] = doc.getElementsByTagName("Img");

NamedNodeMap attributes;
Node srcAttribute;

for(int i=0;i<imageTags.length;i++) {
for(int ii=0;i<imageTags[i].getLength();i++) {
attributes = imageTags[i].item(ii).getAttributes();
srcAttribute = attributes.getNamedItem("src");
if(srcAttribute != null) {
pictures.add(srcAttribute.getNodeValue());
}
}
}

return pictures;
}

}


I haven't really tested the getPictures() method. You might want to consider that an applet, without special permission, can only get files from its own domain.
Page Index Toggle Pages: 1