from csv table to xml

edited February 2016 in Questions about Code

hello,
I try to convert a table in csv, which is like a tree strcuture, into a xml format. is it possible ? I don't really get it
cheers

    XML xml;
    Table table ; 
    void setup() {
      xml = new XML("science");
      table = loadTable("data.csv", "header, tsv");

      XML child = xml.addChild("area");
      child.setString("name", "Chemistry");

      for ( int i=1; i < table.getRowCount(); i++) {
        // if fiefd area in table is equal to child in xml
        if (table.getString(i, "area").equals(child.getString("name"))) {
          println(child.getString("name"));
          // then add field subject present in table to xml new sub child
          XML subChild = child.addChild("subject"); 
          subChild.setContent(table.getString(i, "subject"));
        }
      }

      println(xml);
    }

Answers

  • can you post the csv or relevant parts of it?

  • edited February 2016

    of course, the separator is tab, a header is present :

    area    subject
    Chemistry  Analytical Chemistry
    Chemistry  Chemistry (misc)
    Chemistry  Electrochemistry
    Chemistry  Inorganic Chemistry
    Chemistry  Organic Chemistry
    Chemistry  Physical and Theoretical Chemistry
    Chemistry  Spectroscopy
    Computer Science  Artificial Intelligence
    Computer Science  Computational Theory and Mathematics
    Computer Science  Computer Graphics and Computer-Aided Design
    Computer Science  Computer Networks and Communications
    Computer Science  Computer Science (misc)
    Computer Science  Computer Science Applications
    Computer Science  Computer Vision and Pattern Recognition
    Computer Science  Hardware and Architecture
    Computer Science  Human-Computer Interaction
    Computer Science  Information Systems
    Computer Science  Signal Processing
    Computer Science  Software
    Decision Sciences  Decision Sciences (misc)
    Decision Sciences  Information Systems and Management
    Decision Sciences  Management Science and Operations Research
    Decision Sciences  Statistics, Probability and Uncertainty
    
  • you could first insert all words from the 1st col into a hashmap (see reference)

    each word will be unique then

    then loop over the hashmap and for each word in it check the 2nd col and build the xml

  • That's not a .csv file (CSV = Comma Separated Values). I can't even spot a single comma there! :-?

  • here is the data, with Comma separator thanks

    area,subject
    Chemistry,Analytical Chemistry
    Chemistry,Chemistry (misc)
    Chemistry,Electrochemistry
    Chemistry,Inorganic Chemistry
    Chemistry,Organic Chemistry
    Chemistry,Physical and Theoretical Chemistry
    Chemistry,Spectroscopy
    Computer Science,Artificial Intelligence
    Computer Science,Computational Theory and Mathematics
    Computer Science,Computer Graphics and Computer-Aided Design
    Computer Science,Computer Networks and Communications
    Computer Science,Computer Science (misc)
    Computer Science,Computer Science Applications
    Computer Science,Computer Vision and Pattern Recognition
    Computer Science,Hardware and Architecture
    Computer Science,Human-Computer Interaction
    Computer Science,Information Systems
    Computer Science,Signal Processing
    Computer Science,Software
    Decision Sciences,Decision Sciences (misc)
    Decision Sciences,Information Systems and Management
    Decision Sciences,Management Science and Operations Research
    Decision Sciences,"Statistics, Probability and Uncertainty"
    Dentistry,Dental Assisting
    Dentistry,Dental Hygiene
    Dentistry,Dentistry (misc)
    Dentistry,Oral Surgery
    Dentistry,Orthodontics
    Dentistry,Periodontics
    
  • edited February 2016
    /**
     * CSV to XML (v1.0)
     * GoToLoop (2016-Feb-17)
     * forum.Processing.org/two/discussion/14956/from-csv-table-to-xml
     */
    
    final Table csv = loadTable("data.csv", "header, csv");
    final String[] titles = csv.getColumnTitles();
    println(titles);
    println(csv.getRowCount());
    
    final XML xml = new XML("science");
    println(xml);
    println();
    
    for (final TableRow r : csv.rows()) {
      final XML area = xml.addChild(titles[0]);
      area.setContent(r.getString(titles[0]));
      area.setString(titles[1], r.getString(titles[1]));
      println(area);
    }
    
    final String path = dataPath(xml.getName() + ".xml");
    saveXML(xml, path);
    
    println();
    println(path);
    exit();
    
  • edited February 2016
    /**
     * CSV to XML (v2.1)
     * GoToLoop (2016-Feb-17)
     * forum.Processing.org/two/discussion/14956/from-csv-table-to-xml
     */
    
    final Table csv = loadTable("data.csv", "header, csv");
    final String[] titles = csv.getColumnTitles();
    println(titles);
    println(csv.getRowCount());
    
    final XML xml = new XML("science");
    println(xml);
    println();
    
    for (final TableRow r : csv.rows()) {
      final XML subject = xml.addChild(titles[1]);
      subject.setContent(r.getString(1));
      subject.setString(titles[0], r.getString(0));
      println(subject);
    }
    
    final String path = dataPath(xml.getName() + ".xml");
    saveXML(xml, path);
    
    println();
    println(path);
    exit();
    
  • okay, thanks, but we don't get the tree structure.
    I would like to have

        <area name="Chemistry">
              <subject>Analytical Chemistrys</subject>
              <subject>Chemistry (misc)</subject>
              <subject>Electrochemistry)</subject>
              <subject>Organic Chemistry)</subject>
        </area>
        <area name="Computer Science">
              <subject>Information Systems</subject>
              <subject>Human-Computer Interaction</subject>
              <subject>Hardware and Architecture</subject>
        </area>
    

    so we have to add the children <subject> inside the children <area>

  • edited February 2016 Answer ✓
    /**
     * CSV to XML (v3.14)
     * GoToLoop (2016-Feb-17)
     * forum.Processing.org/two/discussion/14956/from-csv-table-to-xml
     */
    
    final Table csv = loadTable("data.csv", "header, csv");
    final String[] headers = csv.getColumnTitles();
    println(headers);
    println("# of subjects:", csv.getRowCount(), ENTER);
    
    final XML xml = new XML("science");
    println(xml);
    
    import java.util.Map;
    import java.util.LinkedHashMap;
    import java.util.Set;
    import java.util.LinkedHashSet;
    
    final Map<String, Set<String>> areas = new LinkedHashMap<String, Set<String>>();
    
    for (final TableRow r : csv.rows()) {
      final String area = r.getString(0), subject = r.getString(1);
      Set<String> v = areas.get(area);
      if (v == null)  areas.put(area, v = new LinkedHashSet<String>());
      v.add(subject);
    }
    
    println(areas.keySet());
    println("# of areas:", areas.size());
    
    for (final Map.Entry<String, Set<String>> kv : areas.entrySet()) {
      final XML area = xml.addChild(headers[0]);
      area.setString("name", kv.getKey());
      println(ENTER, area, ENTER);
    
      for (final String subject : kv.getValue()) {
        area.addChild(headers[1]).setContent(subject);
        println(subject);
      }
    }
    
    final String path = dataPath(xml.getName() + ".xml");
    saveXML(xml, path);
    
    println();
    println(path);
    exit();
    
  • thanks,
    but still we havent a tree structure. as before area and subject are present in the same element. <area name="Computer Science Artificial Intelligence"> whereas I would like to have

            <area name="Computer Science">
                 <subject>Artificial Intelligence</subject>
                 <subject>Signal Processing</subject>
            </area>
    

    the <subject> is a child of <area>, it is a subfield.

  • Maybe make scribble

  • edited February 2016

    @mxloizix, are you sure you really checked the "science.xml" output from "data.csv" input? 8-|

    <?xml version="1.0" encoding="UTF-8"?>
    <science>
      <area name="Chemistry">
        <subject>Analytical Chemistry</subject>
        <subject>Chemistry (misc)</subject>
        <subject>Electrochemistry</subject>
        <subject>Inorganic Chemistry</subject>
        <subject>Organic Chemistry</subject>
        <subject>Physical and Theoretical Chemistry</subject>
        <subject>Spectroscopy</subject>
      </area>
      <area name="Computer Science">
        <subject>Artificial Intelligence</subject>
        <subject>Computational Theory and Mathematics</subject>
        <subject>Computer Graphics and Computer-Aided Design</subject>
        <subject>Computer Networks and Communications</subject>
        <subject>Computer Science (misc)</subject>
        <subject>Computer Science Applications</subject>
        <subject>Computer Vision and Pattern Recognition</subject>
        <subject>Hardware and Architecture</subject>
        <subject>Human-Computer Interaction</subject>
        <subject>Information Systems</subject>
        <subject>Signal Processing</subject>
        <subject>Software</subject>
      </area>
      <area name="Decision Sciences">
        <subject>Decision Sciences (misc)</subject>
        <subject>Information Systems and Management</subject>
        <subject>Management Science and Operations Research</subject>
        <subject>Statistics, Probability and Uncertainty</subject>
      </area>
      <area name="Dentistry">
        <subject>Dental Assisting</subject>
        <subject>Dental Hygiene</subject>
        <subject>Dentistry (misc)</subject>
        <subject>Oral Surgery</subject>
        <subject>Orthodontics</subject>
        <subject>Periodontics</subject>
      </area>
    </science>
    
  • edited February 2016

    okay, my apologies, my data was in tsv so the output wasn't the same.
    it is perfect
    thank you for your patience

  • edited February 2016

    That's OK! I believe this lil' change from:
    final Table csv = loadTable("data.csv", "header, csv"); to:
    final Table csv = loadTable("data.tsv", "header, tsv");
    can allow to read your original file. :bz

  • edited February 2016

    yeah,luckily I get the Table class :)
    whereas this Map<String, Set<String>> areas = new LinkedHashMap<String, Set<String>>(); is more difficult for me :)
    thx again

  • edited February 2016

    Unfortunately that complex Map was necessary in order to group all subjects under their respective area tag. ^#(^ Otherwise it would end up w/ 29 area tags w/ 1 subject inside each 1. :-&

    1. https://Processing.org/reference/HashMap.html
    2. http://docs.Oracle.com/javase/8/docs/api/java/util/Map.html
    3. http://docs.Oracle.com/javase/8/docs/api/java/util/Map.Entry.html
    4. http://docs.Oracle.com/javase/8/docs/api/java/util/Set.html
  • edited February 2016

    Since I'm trying to learn Python/Jython via Python Mode, decided to convert the Java Mode.
    Much less boiler-plate for sure: B-) Although I had to import almost everything! X(

    '''
     # CSV to XML (v3.14)
     # GoToLoop (2016-Feb-17)
     # forum.Processing.org/two/discussion/14956/from-csv-table-to-xml
    '''
    
    csv = loadTable('data.csv', 'header, csv')
    headers = csv.getColumnTitles()
    #headers = tuple(csv.getColumnTitles())
    print headers
    print '# of subjects:', csv.getRowCount(), ENTER
    
    from processing.data import XML
    xml = XML('science')
    print xml
    
    from collections import OrderedDict
    areas = OrderedDict()
    
    from java.util import LinkedHashSet
    for r in csv.rows():
        area, subject = r.getString(0), r.getString(1)
        v = areas.get(area)
        if v is None: v = areas[area] = LinkedHashSet()
        v.add(subject)
    
    print areas.keys()
    print '# of areas:', len(areas)
    
    for k, v in areas.items():
        area = xml.addChild(headers[0])
        area.setString('name', k)
        print ENTER, area, ENTER
    
        for subject in v:
            area.addChild(headers[1]).setContent(subject)
            print subject
    
    from processing.core.PApplet import dataPath
    path = dataPath(this, xml.getName() + '.xml')
    saveXML(xml, path)
    
    print ENTER, path
    exit()
    
Sign In or Register to comment.