We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpPrograms › duplicate parsed data
Page Index Toggle Pages: 1
duplicate parsed data (Read 1453 times)
duplicate parsed data
Feb 24th, 2010, 11:06am
 
how would i go about finding duplicate parsed data?
i.e.: how to check for the duplicate in;
<subjects tag1="topic1" tag2="topic2"/>
<subjects tag1="topic1" tag2="topic3" tag3="topic4" tag4="topic5"/>
Subject is a child of an element (node), so i need to search the attributes of two 'levels' of the child Subject.

looking through other posts it seems i could make an arrayList, or having seen some advice on comparables/comparators (what ever that is?) seems a hashtable but having never done this hoped someone might advise how to go about doing this.

my sketch can be accessed here:
http://www.thearchitectureofspace.co.uk/help/applet/index.html
...thought it might be an easier way to share/explain my problem. Just refresh to run.

hope that is clear - If not let me know and I'll try to explain myself better. Thanks!!

note: this is kind of a continuation of an earlier post but having revised the code somewhat thought it better to start a new thread.
Re: duplicate parsed data
Reply #1 - Feb 24th, 2010, 2:23pm
 
There are at least two points still unclear. Once you found the duplicates, what will you do with them?
And do you plan to store this data? In what form?
Answering these questions would go a long way toward the solution.
Re: duplicate parsed data
Reply #2 - Feb 25th, 2010, 2:13am
 
I only want to represent the data. Don't want to save or export it. And if a duplicate is found i only want to illustrate it as a node once drawing any links to the first occurence.

The project is a graphical representation of a bibliography; title, author, subjects. An entry is represented as a node(ellipse) for each attribute with lines drawn to represent links.

So book one is drawn as node(title), node(author) and nodes(subjects) with lines between title-author and title-subjects. Which is iterated for each book but if the same subject is found in the next book i only want to draw the links to the same subject which occured before.

Does that answer your questions. If not let me know and I will try to explain myself better. Thankyou!


Re: duplicate parsed data
Reply #3 - Feb 25th, 2010, 3:18pm
 
It is perfectly clear now.
So you need to keep the parsed information in memory, to check the previous ones.

Should I need to do that, I would make three classes: one for books, one for authors, one for subjects. The last two would just hold a string and coordinates, and a method to draw themselves (radius, color...).
The book class would have a reference to an author (or several?), and some references to subjects. Perhaps in an array (if number is limited) or in an array list.
Then I would put the subjects in a HashMap (so they must implement hashCode()): when you try to add a parsed subject to a book, first check if it isn't already in the HashMap. If so, put the reference to the existing subject object. Otherwise, create a new object and add it to the HashMap.

This might seem a bit abstract. I am going to bed. If you need help, I would provide a sketchy implementation tomorrow.
Re: duplicate parsed data
Reply #4 - Feb 26th, 2010, 3:21am
 
thanks PhiLho!
I have added an arrayList to bookNode class to hold author and subject info. But having not used hashmap before yes it does seem abstract so any help would be much appreciated. I am reading through the reference help in processing but so far Shocked

I guess the hashmap would be implemented within the bookNode class?

revised code below - if anything is not clear or my code doesn't make sense i will try to clarify.

import proxml.*;
import proxml.XMLElement;

XMLElement branches;
XMLInOut xmlInOut;

bookNode[] bookParticles;
sourceNode[] sourceParticles;
subjNode[] subjParticles;

bookLinks[] bookConnections;
subjLinks[] subjConnections;
/*..................................................................*/
void setup() {
 size(300,300);
 background(50);
 smooth();

 //load nodes from file
 xmlInOut = new XMLInOut(this);
 xmlInOut.loadElement("testData4.xml");
}
/*..................................................................*/
void xmlEvent(XMLElement element){
 branches = element;
 int nbrNodes = branches.countChildren();

 XMLElement node;
 XMLElement title;
 XMLElement source;
 XMLElement subjects;

 // parse through branches: gets each row of data in xml file
 for(int i = 0; i < branches.countChildren();i++){
   //and get the elements of each row
   node = branches.getChild(i);

   //get the attributes of each element in a row/child - and add attibutes to kin class
   
   /******BOOK DATA******/
   title = node.getChild(0);
   title.getAttribute("name");
   String labelName = title.getAttribute("name");
   bookParticles = new bookNode[nbrNodes];
   bookParticles[i] = new bookNode(new PVector(random(0,width), random(0,height)));
   bookParticles[i].name = labelName;
   
   /******AUTHOR DATA******/
   source = node.getChild(1);
   source.getAttribute("origin");
   String labelOrigin = source.getAttribute("origin");
   sourceParticles = new sourceNode[nbrNodes];
   sourceParticles[i] = new sourceNode(new PVector(random(0,width), random(0,height)));
   sourceParticles[i].name = labelOrigin;
   //add author references to arrayList authorSources of bookNode class
   bookParticles[i].authorSources.add(source.getAttribute("origin"));
   
   /******SUBJECT DATA******/
   subjects = node.getChild(2);
   int nbr = subjects.countAttributes();
   //add subject references to arrayList subjectTags of bookNode class
   bookParticles[i].subjectTags.add(subjects.getAttribute("tag1"));
   bookParticles[i].subjectTags.add(subjects.getAttribute("tag2"));
   try {
     bookParticles[i].subjectTags.add(subjects.getAttribute("tag3"));
   }
   catch (Exception e) {
   }
   try {
     bookParticles[i].subjectTags.add(subjects.getAttribute("tag4"));
   }
   catch (Exception e) {
   }
   try {
     bookParticles[i].subjectTags.add(subjects.getAttribute("tag5"));
   }
   catch (Exception e) {
   }

   //WHAT TO DO???
   //loop through array list to check each value for previous duplicate values
   //if exists then ignore it - go to draw links: HOW???
   //Or use hashtable: HOW?!?!?!

   //subject data is organised as a child with multiple attributes
   //loop through attributes of each row/child
   for(int j=0; j<nbr; j++){

     subjParticles = new subjNode[nbr];
     subjParticles[j] = new subjNode(new PVector(random(width), random(height)));
     subjParticles[j].display();

     //subject to book connections
     subjConnections = new subjLinks[nbr];
     subjConnections[j] = new subjLinks(subjParticles[j].position.get(), bookParticles[i].position.get());
     subjConnections[j].display();
   }

   //book to author connections
   bookConnections = new bookLinks[nbrNodes];
   bookConnections[i] = new bookLinks(bookParticles[i].position.get(), sourceParticles[i].position.get());

   bookParticles[i].display();
   sourceParticles[i].display();
   bookConnections[i].display();
 }
}
/*..................................................................*/
class bookNode {

 PVector position;
 String name;
 ArrayList authorSources = new ArrayList();
 ArrayList subjectTags = new ArrayList();
 
 bookNode(PVector loc) {
   position = loc;
 }
 
 void display() {
   noStroke();
   fill(255,0,0);
   ellipse(position.x, position.y, 5, 5);
   fill(200,200,200,50);
   ellipse(position.x, position.y, 15, 15);

   //    println(authorSources);
   //    println(subjectTags);
 }

}
/*..................................................................*/
class sourceNode {     //same as subjNode

 PVector position;
 String name;
 
 sourceNode(PVector loc) {
   position = loc;
 }
 
 void display() {
   noStroke();
   fill(255,255,0);
   ellipse(position.x, position.y, 5, 5);
   fill(200,200,200,50);
   ellipse(position.x, position.y, 15, 15);
 }

}
/*..................................................................*/
class bookLinks {       //same as subjLinks

 PVector bookPos;
 PVector sourcePos;
 
 bookLinks(PVector locBook, PVector locSource) {
   bookPos = locBook;
   sourcePos = locSource;    
 }
 
 void display() {
   stroke(200);
   strokeWeight(0.5);
   line(bookPos.x,bookPos.y,sourcePos.x,sourcePos.y);
 }

}
Re: duplicate parsed data
Reply #5 - Feb 27th, 2010, 1:11pm
 
OK, I did it. Interesting exercise... I skipped the XML parsing as you seem to master it. You just have to use the methods I show inside the XML callback.
Code:
ArrayList/* <bookNode> */ bookParticles = new ArrayList();
HashMap/* <sourceNode> */ sources = new HashMap();
HashMap/* <subjectNode> */ subjects = new HashMap();

/*
bookLinks[] bookConnections;
subjLinks[] subjConnections;
*/
/*..................................................................*/
void setup() {
size(300,300);
background(50);
smooth();

/* Example of data:
<branches>
<node>
<title name="book1"/>
<source origin="author1"/>
<subjects tag1="topic1" tag2="topic2"/>
</node>
<node>
<title name="book2"/>
<source origin="author2"/>
<subjects tag1="topic1" tag2="topic3" tag3="topic4" tag4="topic5"/>
</node>
</branches>
*/
// I lazily setup manually data, you seem to manage well XML parsing...
bookNode bn = new bookNode("book1", getRandomVector());
bookParticles.add(bn);
sourceNode src = new sourceNode("author1", getRandomVector());
bn.add(src);
subjectNode subj = new subjectNode("topic1", getRandomVector());
bn.add(subj);
subj = new subjectNode("topic2", getRandomVector());
bn.add(subj);

bn = new bookNode("book2", getRandomVector());
bookParticles.add(bn);
src = new sourceNode("author2", getRandomVector());
bn.add(src);
subj = new subjectNode("topic1", getRandomVector());
bn.add(subj);
subj = new subjectNode("topic3", getRandomVector());
bn.add(subj);
subj = new subjectNode("topic4", getRandomVector());
bn.add(subj);
subj = new subjectNode("topic5", getRandomVector());
bn.add(subj);

bn = new bookNode("book3", getRandomVector());
bookParticles.add(bn);
src = new sourceNode("author1", getRandomVector());
bn.add(src);
src = new sourceNode("author3", getRandomVector());
bn.add(src);
subj = new subjectNode("topic4", getRandomVector());
bn.add(subj);
subj = new subjectNode("topic2", getRandomVector());
bn.add(subj);

for (int i = 0; i < bookParticles.size(); i++) {
bookNode b = (bookNode) bookParticles.get(i);
b.display();
}
Collection csrc = sources.values();
for (Iterator it = csrc.iterator(); it.hasNext(); ) {
sourceNode s = (sourceNode) it.next();
s.display();
}
Collection csubj = subjects.values();
for (Iterator it = csubj.iterator(); it.hasNext(); ) {
subjectNode s = (subjectNode) it.next();
s.display();
}
}

PVector getRandomVector()
{
return new PVector(random(width), random(height));
}

// Take out common parts of classes and put them in a separate class
// Classes extending this one just inherit (get) their fields and methods.
class Node {

PVector position;
String name;
color fillColor;

Node(PVector loc, color c) {
position = loc;
fillColor = c;
}

void display() {
noStroke();
fill(fillColor);
ellipse(position.x, position.y, 5, 5);
fill(200,200,200,50);
ellipse(position.x, position.y, 15, 15);
}

}

class bookNode extends Node {

String title;

ArrayList authorSources = new ArrayList();
ArrayList subjectTags = new ArrayList();

bookNode(String t, PVector loc) {
super(loc, #FF0000);
title = t;
}

void add(sourceNode src) {
// Is that author already seen?
sourceNode sn = (sourceNode) sources.get(src.origin);
if (sn == null) { // No
// Add it to the list of authors
// with the name as key
sources.put(src.origin, src);
} else {
// Yes, just add the reference to this source
src = sn;
}
// Add it to the list of sources of the book
authorSources.add(src);
}

// We can have two identically named methods as long
// as their list of parameters are different
void add(subjectNode subj) {
// Is that tag already seen?
subjectNode sn = (subjectNode) subjects.get(subj.tag);
if (sn == null) { // No
// Add it to the list of tags
// with the name as key
subjects.put(subj.tag, subj);
} else {
// Yes, just add the reference to this source
subj = sn;
}
// Add it to the list of sources of the book
subjectTags.add(subj);
}

void display() {
super.display();
stroke(200);
strokeWeight(0.5);
for (int i = 0; i < authorSources.size(); i++) {
sourceNode s = (sourceNode) authorSources.get(i);
line(position.x, position.y, s.position.x, s.position.y);
}
for (int j = 0; j < subjectTags.size(); j++) {
subjectNode s = (subjectNode) subjectTags.get(j);
line(position.x, position.y, s.position.x, s.position.y);
}
}

}

/*..................................................................*/
class sourceNode extends Node {

String origin;

sourceNode(String source, PVector loc) {
super(loc, #99FF00);
origin = source;
}

/* @Override */
int hashCode() { return origin.hashCode(); }
// Should also check that o is not null, of right type, etc.
/* @Override */
boolean equals(Object o) { return origin.equals(((sourceNode) o).origin); }

}

class subjectNode extends Node {

String tag;

subjectNode(String subject, PVector loc) {
super(loc, #FFBB00);
tag = subject;
}

/* @Override */
int hashCode() { return tag.hashCode(); }
/* @Override */
boolean equals(Object o) { return tag.equals(((subjectNode) o).tag); }

}


/*..................................................................*/
// Finally, I don't use these...
class Link {

PVector bookPos;
PVector sourcePos;

Link(PVector locBook, PVector locSource) {
bookPos = locBook;
sourcePos = locSource;
}

void display() {
stroke(200);
strokeWeight(0.5);
line(bookPos.x, bookPos.y, sourcePos.x, sourcePos.y);
}

}

class bookLink extends Link {
bookLink(PVector locBook, PVector locSource) {
super(locBook, locSource);
}
}

class subjLink extends Link {
subjLink(PVector locBook, PVector locSource) {
super(locBook, locSource);
}
}
Re: duplicate parsed data
Reply #6 - Mar 1st, 2010, 6:31am
 
PhiLho, wow. Thankyou!

I've updated my xml attempt to your methods and its working, spot on!

Felt like i was banging my head against a brickwall through february but today march 1st the sun here in london is shining. time for a cup of tea!!

Thankyou.
Re: duplicate parsed data
Reply #7 - Mar 6th, 2010, 9:32am
 
Just trying to get my head around how the hashMap actually works and what it is actaully storing - apart from the subject (subj.tag) does it also store a value for how many times each subject occurs?
Re: duplicate parsed data
Reply #8 - Mar 6th, 2010, 12:45pm
 
No, but since that's a class, it is very easy to add:
Code:
class subjectNode extends Node {
 
 String tag;
 int occurrenceNb;
 
 subjectNode(String subject, PVector loc) {
   super(loc, #FFBB00);
   tag = subject;
occurrenceNb = 1;
 }
 
 /* @Override */
 int hashCode() { return tag.hashCode(); }
 /* @Override */
 boolean equals(Object o) { return tag.equals(((subjectNode) o).tag); }

}

class bookNode extends Node {
// [...]
 void add(subjectNode subj) {
   // Is that tag already seen?
   subjectNode sn = (subjectNode) subjects.get(subj.tag);
   if (sn == null) { // No
// Add it to the list of tags
// with the name as key
subjects.put(subj.tag, subj);
   } else {
// Yes, just add the reference to this source
subj = sn;
sn.occurrenceNb++;
   }
   // Add it to the list of sources of the book
   subjectTags.add(subj);
 }
// [...]
}
(untested)
Re: duplicate parsed data
Reply #9 - Mar 9th, 2010, 10:21am
 
a-ha!

i managed to do it before seeing this by creating another arrayList to hold every instance of a subject and adding a counter to the subject class.
then check each value in the arrayList is in the hashMap, adding to the counter in class each time it occurs.

but your method is much nicer - and i assume faster as it uses whats already in the programme.

thanks again!
Page Index Toggle Pages: 1