Parsing unusual XML file

edited March 2015 in How To...

Hi there,

I am trying to parse a BeerXML file (specification can be found under www.beerxml.com), which is kind of unusual. The structure looks more or less like below.

<?xml version="1.0" encoding="ISO-8859-1"?>
<RECIPES>
   <RECIPE>
     <NAME>Wiezenbock</NAME>
     <VERSION>1</VERSION>
     <TYPE>All Grain</TYPE>
     <BREWER>Panin</BREWER>
     <ASST_BREWER></ASST_BREWER>
     <BATCH_SIZE>20.0000000</BATCH_SIZE>
     <BOIL_SIZE>25.3400000</BOIL_SIZE>
     <BOIL_TIME>70.0000000</BOIL_TIME>
     <EFFICIENCY>70.0000000</EFFICIENCY>
     <HOPS>
        <HOP>
        <NAME>Lublin</NAME>
        <VERSION>1</VERSION>
        <ORIGIN>Poland</ORIGIN>
        <ALPHA>5.0000000</ALPHA>
        </HOP>
        ...
      </HOPS>
       ...
   </RECIPE>
</RECIPES>

And so on. Despite wrong encoding (I will take care of this myself), here are many children, while built-in XML parser reads only one parent - "RECIPES" and only one child - "RECIPE". The XML file is automatically generated by BeerSmith software, so I do not have any possibility to modify the output. Is there any way to properly extract e.g. all values of "NAME" for all found "HOP" within "HOPS"?

Answers

  • I don't see how the XML is "unusual". Looks OK to me. Processing has the loadXML() method, have you looked at it?

  • edited March 2015

    Like @PhiLho said, that's a valid XML file: :-\"

    XML beer = loadXML("beer.xml");
    println(beer);
    exit();
    

    However, if you're only looking for NAME values inside HOP entries, a custom parser can be written too: :-bd

    // forum.processing.org/two/discussion/10016/parsing-ususual-xml-file
    
    void setup() {
      String[] beers = loadStrings("beer.xml");
    
      printArray(beers);
      println();
    
      String[] names = findHopNames(beers);
      printArray(names);
      exit();
    }
    
    static final String[] findHopNames(String[] arr) {
      if (arr == null || arr.length == 0)  return new String[0];
    
      StringList sl = new StringList();
      boolean hopFlag = false;
    
      for (String s : arr) {
        if (s == null)  continue;
    
        if      (s.contains("<HOP>"))   hopFlag = true;
        else if (s.contains("</HOP>"))  hopFlag = false;
        else if (hopFlag) {
          String name = extractHopName(s);
    
          if (name != null) {
            sl.append(name);
            hopFlag = false;
          }
        }
      }
    
      return sl.array();
    }
    
    protected static final String extractHopName(String s) {
      int idx = s.indexOf("<NAME>") + 6, end = s.indexOf("</", idx);
      return idx != 5? s.substring(idx, end) : null;
    }
    
  • Answer ✓

    Bad idea to use a custom parser for XML, in general... As shown in the loadXML page, the loaded XML can be walked to inspect the children, to an arbitrary depth.

  • PhiLho, GoToLoop, thank you both for the contribution.

    GoToLoop, your custom parser is a great solution, but why write new functions, when the default library should deal with a problem right away. :) If we would not figure out what I am doing wrong here, so I cannot parse the file in normal way I will use your solution.

    PhiLho, I am using loadXML() but somehow I am not able to extract information I want. I can only get "RECIPES" as parent and "RECIPE" as child. To write my code I used references from here - https://processing.org/reference/loadXML_.html . Maybe I just cannot see something that is clear, but I would be grateful if you could show me how to get names of each hop within the recipe.

  • Ok, I have figured it out. Below working code.

    XML xml;
    
    void setup() {
      xml = loadXML("recipe.xml");
      XML[] recipe = xml.getChildren("RECIPE");
      XML[] hops = recipe[0].getChildren("HOPS");
      XML[] hop = hops[0].getChildren("HOP");
    
      String[] hop_names = new String[hop.length];
    
      for (int i = 0; i < hop.length; i++)
      {
        XML[] x = hop[i].getChildren("NAME");
        hop_names[i] = x[0].getContent();
        println(hop_names[i]);
      }
    }
    
  • Good! Thanks for sharing your solution.
    Pro Tip when asking for help: show your attempt, so that we can see what went wrong... :-) Even better here as you solved the issue by yourself.

Sign In or Register to comment.