Parsing unusual XML file

copytco · March 2015

Hi there,

I am trying to parse a BeerXML file (specification can be found under www.beerxml.com), which is kind of unusual. The structure looks more or less like below.

<?xml version="1.0" encoding="ISO-8859-1"?>
<RECIPES>
   <RECIPE>
     <NAME>Wiezenbock</NAME>
     <VERSION>1</VERSION>
     <TYPE>All Grain</TYPE>
     <BREWER>Panin</BREWER>
     <ASST_BREWER></ASST_BREWER>
     <BATCH_SIZE>20.0000000</BATCH_SIZE>
     <BOIL_SIZE>25.3400000</BOIL_SIZE>
     <BOIL_TIME>70.0000000</BOIL_TIME>
     <EFFICIENCY>70.0000000</EFFICIENCY>
     <HOPS>
        <HOP>
        <NAME>Lublin</NAME>
        <VERSION>1</VERSION>
        <ORIGIN>Poland</ORIGIN>
        <ALPHA>5.0000000</ALPHA>
        </HOP>
        ...
      </HOPS>
       ...
   </RECIPE>
</RECIPES>

And so on. Despite wrong encoding (I will take care of this myself), here are many children, while built-in XML parser reads only one parent - "RECIPES" and only one child - "RECIPE". The XML file is automatically generated by BeerSmith software, so I do not have any possibility to modify the output. Is there any way to properly extract e.g. all values of "NAME" for all found "HOP" within "HOPS"?

PhiLho · March 2015

I don't see how the XML is "unusual". Looks OK to me. Processing has the loadXML() method, have you looked at it?

GoToLoop · March 2015

Like @PhiLho said, that's a valid XML file: :-\"

XML beer = loadXML("beer.xml");
println(beer);
exit();

However, if you're only looking for NAME values inside HOP entries, a custom parser can be written too: :-bd

// forum.processing.org/two/discussion/10016/parsing-ususual-xml-file

void setup() {
  String[] beers = loadStrings("beer.xml");

  printArray(beers);
  println();

  String[] names = findHopNames(beers);
  printArray(names);
  exit();
}

static final String[] findHopNames(String[] arr) {
  if (arr == null || arr.length == 0)  return new String[0];

  StringList sl = new StringList();
  boolean hopFlag = false;

  for (String s : arr) {
    if (s == null)  continue;

    if      (s.contains("<HOP>"))   hopFlag = true;
    else if (s.contains("</HOP>"))  hopFlag = false;
    else if (hopFlag) {
      String name = extractHopName(s);

      if (name != null) {
        sl.append(name);
        hopFlag = false;
      }
    }
  }

  return sl.array();
}

protected static final String extractHopName(String s) {
  int idx = s.indexOf("<NAME>") + 6, end = s.indexOf("</", idx);
  return idx != 5? s.substring(idx, end) : null;
}

PhiLho · March 2015

Bad idea to use a custom parser for XML, in general... As shown in the loadXML page, the loaded XML can be walked to inspect the children, to an arbitrary depth.

copytco · March 2015

PhiLho, GoToLoop, thank you both for the contribution.

GoToLoop, your custom parser is a great solution, but why write new functions, when the default library should deal with a problem right away. :) If we would not figure out what I am doing wrong here, so I cannot parse the file in normal way I will use your solution.

PhiLho, I am using loadXML() but somehow I am not able to extract information I want. I can only get "RECIPES" as parent and "RECIPE" as child. To write my code I used references from here - https://processing.org/reference/loadXML_.html . Maybe I just cannot see something that is clear, but I would be grateful if you could show me how to get names of each hop within the recipe.

copytco · March 2015

Ok, I have figured it out. Below working code.

XML xml;

void setup() {
  xml = loadXML("recipe.xml");
  XML[] recipe = xml.getChildren("RECIPE");
  XML[] hops = recipe[0].getChildren("HOPS");
  XML[] hop = hops[0].getChildren("HOP");

  String[] hop_names = new String[hop.length];

  for (int i = 0; i < hop.length; i++)
  {
    XML[] x = hop[i].getChildren("NAME");
    hop_names[i] = x[0].getContent();
    println(hop_names[i]);
  }
}

PhiLho · March 2015

Good! Thanks for sharing your solution.
Pro Tip when asking for help: show your attempt, so that we can see what went wrong... :-) Even better here as you solved the issue by yourself.

Howdy, Stranger!

Categories

In this Discussion

Parsing unusual XML file

Best Answer

Answers