We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › Error using xml library to read wikipedia
Page Index Toggle Pages: 1
Error using xml library to read wikipedia (Read 665 times)
Error using xml library to read wikipedia
May 20th, 2008, 6:09am
 
Hello, I'm trying to parse wikipedia links with the xml library, but I'm getting an error when I try to load a wikipedia article. I'm using the following code:

import processing.xml.*;

void setup() {
String url = "http://en.wikipedia.org/wiki/Car";
XMLElement test = new XMLElement(this, url);
}

And I'm getting the following error:

processing.xml.XMLParseException: XML Parse Exception during parsing of a html element at line 82: Expected: /

Does anybody knows what is going on. I was able to read rss feeds but for some reason I can't read wikipedia articles. Is there any other option I could use besides the xml library?

Thanks
Re: Error using xml library to read wikipedia
Reply #1 - May 20th, 2008, 8:06am
 
You should probably use an html parser rather than an xml parser...

http://www.texone.org/prohtml/
Re: Error using xml library to read wikipedia
Reply #2 - May 20th, 2008, 2:02pm
 
It is strange because it is Valid XHTML 1.0 Transitional (according to W3C's checker), so it should be parsed as XML without problem.
And I see no problems at this line or around.
Might be a bug of the library, or something.

Using an HTML parser might be the right solution, indeed, they are more tolerant.
Re: Error using xml library to read wikipedia
Reply #3 - May 21st, 2008, 1:22pm
 
that happens when trying to parse tags that are containing text and other nested tags

Code:
<myXml>
<p>this line will be parsed without any problem</p>
<p>this line will throw a <b>XMLParseException</b></p>
</myXml>
Re: Error using xml library to read wikipedia
Reply #4 - May 21st, 2008, 2:41pm
 
Ah But that's valid XML! Even newlines/indentations are text nodes in XML (IIRC).
Is that a limitation of XMLElement
Should we fall back on Java's XML library

[EDIT] OK, I see in XML Import that it uses NanoXML 2.2.3 Lite. The linked site states:
Quote:
NanoXML/Lite
   An extremely small (6KB) XML parser which is the successor of NanoXML 1. It only provides a limited functionality: no mixed content and the DTD is ignored.

I stressed the "no mixed content" part...
I suppose this library have been chosen for its small size... Or perhaps because it is of simpler use than other libraries.

Now, there are numerous XML parsers for Java, from XMLReader or the other NanoXML libraries to JDOM, dom4j, ElectricXML or XOM...
Re: Error using xml library to read wikipedia
Reply #5 - May 21st, 2008, 10:18pm
 

Thanks, for your help, now I know why the xml parser wasn't working. I'm using proHTML now, and it has been working good so far. If I find something better I'll let you know.
Page Index Toggle Pages: 1