Problems with æøå characters in XML

Hi

This is my first post in here. I have a simple problem with getting Scandinavian characters back from a xml. Æ, Ø and Å all come out wrong from this code.

String url = "http://suggestqueries.google.com/complete/search?output=toolbar&hl=dk&q=ærø";

void setup() {
 XML xml = loadXML(url);
  XML[] children = xml.getChildren("CompleteSuggestion");

  for (int i = 0; i < children.length; i++) {
    XML suggestion = children[i].getChild("suggestion");
    println(suggestion);
  } 
}

If you visit the url http://suggestqueries.google.com/complete/search?output=toolbar&hl=dk&q=ærø you see, that the characters Æ, Ø Å all work fine on the url itself.

Help is much appreciated.

Regards Andreas from Denmark

Answers

  • edited July 2015

    Indeed when directly loading from the URL, the encoding fails.
    However, after saving & loading it locally it strangely worked.
    Perhaps the original isn't UTF-8 and somehow became so after saving it locally? :-??

    static final String URL = "http://" + "suggestqueries.google.com/"
    + "complete/search?output=toolbar&hl=dk&q=%C3%A6r%C3%B8";
    
    void setup() {
      size(400, 150, JAVA2D);
      smooth(4);
      noLoop();
    
      background(0350);
      fill(#0080FF);
    
      textSize(050);
      textAlign(CENTER, CENTER);
    
      //XML xml = loadXML(URL);
      XML xml = loadXML("search.xml");
    
      XML[] children = xml.getChildren("CompleteSuggestion");
      for (XML child : children)  println(child.getChild("suggestion"));
    
      XML suggestion = children[(int) random(children.length)].getChild("suggestion");
      text(suggestion.getString("data"), width>>1, height>>1);
    }
    
  • hmm, strange...

    I need to load it from the XML however, my sketch is going to be interactive (the example is just provided to boil down the problem)...

  • edited July 2015
    • I'm afraid 1st you're gonna need to find out which encode is being used there.
    • Then use loadBytes() to get that file as a byte[].
    • Create a new String from that byte[], passing along the Charset decoder for it.
    • Finally call parseXML() over that now decoded String.
    • Warning: "Not tested at all yet!" :-\"

    https://Processing.org/reference/loadBytes_.html
    http://docs.Oracle.com/javase/8/docs/api/java/lang/String.html#String-byte:A-java.nio.charset.Charset-
    http://docs.Oracle.com/javase/8/docs/api/java/nio/charset/Charset.html
    https://Processing.org/reference/parseXML_.html

  • Thanks! I think it is a bit above my skill-level... How do I find out which encode is being used?

  • edited July 2015 Answer ✓

    // forum.Processing.org/two/discussion/11860/problems-with-aeoa-characters-in-xml
    // 2015-Jul-28
    
    static final String URL = "http://" + "SuggestQueries.Google.com/complete/"
    + "search?output=toolbar&hl=dk&q=ærø";
    
    void setup() {
      size(400, 150, JAVA2D);
      smooth(4);
      noLoop();
    
      background(0350);
      fill(#0080FF);
    
      textSize(050);
      textAlign(CENTER, CENTER);
    
      //XML xml = loadXML(URL);
      //XML xml = loadXML("search.xml");
    
      byte[] data = loadBytes(URL);
      String decoded = new String(data, java.nio.charset.Charset.forName("ISO-8859-1"));
    
      XML xml = parseXML(decoded);
    
      XML[] children = xml.getChildren("CompleteSuggestion");
      for (XML child : children)  println(child.getChild("suggestion"));
    
      XML suggestion = children[(int) random(children.length)].getChild("suggestion");
      text(suggestion.getString("data"), width>>1, height>>1);
    }
    
  • Wow! Thanks a lot GoToLoop :-)

    I really appreciate it, hope I myself can help others the way you just helped me!

Sign In or Register to comment.