Processing 1.0 - Processing Discourse - Its possible to read Online-HTML-Code?

We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.

Index › Programming Questions & Help › Other Libraries › Its possible to read Online-HTML-Code

‹ Previous Topic | Next Topic ›

Pages: 1

Its possible to read Online-HTML-Code? (Read 943 times)

mBuko

Its possible to read Online-HTML-Code?
Sep 27^th, 2009, 5:13am

Hey,

im just a newbie in processing. Because of that i would be happy for getting help

I want to read or get the html code of a page...i want to search in this code for a picture, which i want to load in processing....

But how its possible to read or get the html code of a page.....I just want to skip the the step to use firefox and the posiibility to see the code....

in C# you can use the "WebClient" and the "DownloadString" and in processing? Cheesy

Thanks

Cedric

Re: Its possible to read Online-HTML-Code?
Reply #1 - Sep 27^th, 2009, 7:08am

do you have special website, special image you want to see ? if not there are several apis that makes it easy to search for images on flickr, or google for example.

Or maybe take a look at prohtml http://creativecomputing.cc/p5libs/prohtml/ there you can interate over the html tree structure and look for images. Shouldnt be to hard to extract the image path then.

twitter.com/CedricKiefer

michiel

Re: Its possible to read Online-HTML-Code?
Reply #2 - Sep 27^th, 2009, 7:49am

Hiya

I made a class I think can do what you want.

Code:


import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class HtmlPictureReader {
	
	public HtmlPictureReader() {
	}
	
	public Document readHtmlDoc(String url) {
		try {
			DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
			DocumentBuilder db = dbf.newDocumentBuilder();
			Document doc = db.parse(read("http://somesite.com/somepage.html"));
			doc.getDocumentElement().normalize();
			return doc;
		}
		catch(Exception e) {
		}
		return null;
	}
	
	public InputStream read(String filename) {
		try {
			return new FileInputStream(new File(filename));
		} 
		catch (Exception e) {
		}
		return null;
	}
	
	public ArrayList<String> getPictures(Document doc) {
		ArrayList<String> pictures = new ArrayList<String>();
		
		NodeList[] imageTags = new NodeList[3];
		
		imageTags[0] = doc.getElementsByTagName("img");
		imageTags[1] = doc.getElementsByTagName("IMG");
		imageTags[2] = doc.getElementsByTagName("Img");
		
		NamedNodeMap attributes;
		Node srcAttribute;
		
		for(int i=0;i<imageTags.length;i++) {
			for(int ii=0;i<imageTags[i].getLength();i++) {
				attributes = imageTags[i].item(ii).getAttributes();
				srcAttribute = attributes.getNamedItem("src");
				if(srcAttribute != null) {
					pictures.add(srcAttribute.getNodeValue());
				}
			}
		}
		
		return pictures;
	}
	
}

I haven't really tested the getPictures() method. You might want to consider that an applet, without special permission, can only get files from its own domain.

Pages: 1

‹ Previous Topic | Next Topic ›