I am working on an experimental non-fiction film as part of my Media Arts MFA program at the City College of New York. I have started a program in Processing that parses text from a large set of emails into an associative dictionary. I need help getting to the next stage, visualizing the associations between the same words in different emails. Because of my humble coding skills, time constraints, and a considerable workload for my other course work, I don't think I'll be able to execute this idea fully without help. The film is due first of May; ideally, the program should be largely functional by early April. I can offer some compensation, but I'm on a student budget, so please be negotiable!
The code that builds associative dictionary database is functioning now.
I would like my sketch to parse some text files in a subfolder to the sketches data folder, or perhaps in arbitrary folder. However, on a mac, the OS always creates a file called ".DS_Store". I would like my sketch to only read files that have a .txt extension. In general i would like to know how to code sketches that can use arbitrary file filters. Unfortunately the JavaDoc regarding the FileFilter interface is a bit over my head, and the processing.org documentation on file i/o is sparse. Any help or code snippets would be greatly appreciated!
I'd love to be able to change the white background to a quadrille ruled background. I think that would help me see the indentation levels better. Is it possible to do that?
I have a HashMap that uses a String as they key, and a composite Object as value. I have been able to convert the keys used in the hashmap into an array of Strings with a two step process:
HashMap words;
String[] keyWords;
Object[] temp;
temp = words.keySet().toArray();
keyWords = new String[temp.length];
for(int i=0; i<temp.length; i++) {
keyWords[i] = temp[i].toString();
}
Is there anyway to do away with the temporary Object[] array and the for-loop to convert the objects to strings?
Thanks!
I am trying to create an associative dictionary. This is a database which contains a record for every unique word occurring in an arbitrary set of text files. Each record should contain the word, a list of files which contain the word, and the indices into those file of each occurrence of the word in that file. Something like this:
Word Files Occurrences
able file1.txt 0, 20, 35, ....
file2.txt 330, 450, 453....
baker file1.txt 1, 15
file3.txt 1,4,9
charlie fileN.txt i, i1,i2,...
What I have so far is a HashMap using a Word class as the value. The Word class contains another HashMap, this time for each file that contains an occurrence of a given word. It also contains an ArrayList for the indices at which that word occurs. I've been struggling with this for a while. Is there a design pattern for this situation? I'd hate to reinvent the wheel, especially as I'm a Processing (and programming generally) noob.
I've attached my code below - its messy I know. Sorry.
HashMap words; // HashMap object
String[] tokens; // Array of all words from input file
int counter;
void setup() {
String path = sketchPath+"/data";
String[] files = listFileNames(path);
println(files);
words = new HashMap();
for(int a=0; a<files.length; a++) { // repeat this block for each file in 'files'
// Load file and chop it up
String[] lines = loadStrings(files[a]);
String allText = join(lines, " ");
tokens = splitTokens(allText, " ,.?!:;[]-");
println("Done chopping up file #: "+a+" Called "+files[a]);
// Look at words one at a time
while(counter<tokens.length) {
String s = tokens[counter];
// Is the word in the HashMap
if (words.containsKey(s)) {
println("database already has an entry for "+s.toUpperCase());
Word w=(Word) words.get(s);
w.updateWord(files[a],counter);
}
else {
println("make a new entry for "+s.toUpperCase());
Word w = new Word(files[a], s,counter);
words.put(s, w);
}
counter++;
} // end of while loop for each word in 'tokens'
} // End of for loop for each file
dumpDictionary();
}
/* A routine to print out the contents of the main hash map
*/
void dumpDictionary() {
Iterator i = words.keySet().iterator();
while (i.hasNext()) {
String w = (String) i.next();
print(w+": ");
Word wd = (Word) words.get(w);
HashMap fi = wd.fileIndices;
int fs = fi.size();
println(fs);
//Iterator j = fi.values().iterator();
/*
while(j.hasNext()) {
ArrayList ix = (ArrayList) fi.get(w);
int sz = ix.size();
for(int k=0; k<sz; k++){
Integer m = (Integer) ix.get(k);
}
}
*/
}
}
// This function returns all the files in a directory as an array of Strings
String[] listFileNames(String dir) {
File file = new File(dir);
if (file.isDirectory()) {
String names[] = file.list();
return names;
} else {
// If it's not a directory
return null;
}
}
/* Word object to store in associative dictionary.
Each object should store the indices of the occurence of the word in every file that's sent to it.
Indices should be kept as an Array this time; use append() to update
For next version try using an ArrayList.
ArrayList version seems to be working, now try it in hash map experiment.
Handle all checking inside class
*/
class Word {
int count;
String word;
HashMap fileIndices; // A hashMap of fileIndexes, 1 entry per file
Word() { // null-arg constructor
fileIndices = new HashMap();
}
Word(String fileName, String word, int index) {
fileIndices = new HashMap();
this.word = word;
count = 1;
ArrayList indices = new ArrayList();
indices.add(new Integer(index));
fileIndices.put(fileName, indices);
}
/*'updateWord()' method takes a file name and index as parameters.
If 'fileName' is already stored, just updated the count for that file, otherwise add 'fileName' to database,
and set its first entry as 'index'.
*/
void updateWord(String fileName, int index) {
//println("update this word with "+fileName+" and index "+index);
// First, Check to see if this file has already been added to this word
I've been teaching myself Processing and exploring some contributed libraries. The SoundCipher site has some examples using Callbacks. I can intuit what they are generally, but where can I read up on the specifics? Links to Javadocs or tutorial pages greatly appreciated!
-L