Concordance_txt_loadString_

edited June 2016 in How To...

Hi all,

I'm new with Processing but to learn it i wanted to do it through thesis project[tough guy]. The highlights of this one is creating a concordance book from a famous epic poem (open source too). [this poem is on txt file]

So, for every word the code should say: "count total word", "count on each chapter word" and the "verse positions".

Ex.= "Gamble = 340, Ch. 1 - 120, Ch 2 - 220. Vrs: 142, 140, 3, 5, 66, [...]."

I saw the loadString and ok, but what i don't understand is that can i do multi-loadStrings for each chapter to do a sort of delimiter for counter chapter?

First idea was to take the txt file and divide it for each chapter: Is that a right way do it?

I saw the lessons of Daniel on Youtube, very helpful [Thanks] and i bought all the books. The problem is i don't know well where to start. I'm sorry for disturbing. Grateful for your help.

Answers

  • are the verse positions also per chapter or for the entire book?

    which book is it?

    is the txt file for the entire book or one txt file per chapter?

    when it is one txt for the entire book obviously load the entire a book and split it up into chapters,

    Have an array textsOfTheChapters[] where you store the texts

  • edited April 2016

    Hi Mr Chrisir. Thanks for the respond. And sorry for the bad grammar, i'm still learning the english.

    Well. Starting with the first question: no haven't seen it yet. I will right now.

    The verse positions are for the entire book.

    The book is The Divine Comedy by Dante Alighieri.

    I have both of them.

    Ok, i'll do it. Thanks.

  • Are the txt files like

    Vers Number, text ?

    Then detect the comma an split i there

    maybe you can use split() if it is

    ","

  • I don't get it well, but:

    •I have 3 files txt corresponding each chapter: Hell, Paradise and Purgatory. [i split it in my own] •Then i have the unique [original] file that contains the whole text book. •Some line doesn't finish with comma [if i understand well]

    To simplify the question here it is an example:

    Capture

  • Thank you.

    Ok, no line Numbers in the txt file

  • So i guess the only way is to transform it in xml. Right?

  • No xml is not required here

    you make backup copies of all 4 txt files first

    use hashmap to monitor which words you already have

  • here is an example - not by me

  • //  https : // forum.processing.org/two/discussion/14188/finding-the-index-numbers-of-an-array-of-similiar-strings#latest
    
    //  list all the index values of the String array member, c (like {3,4,7})?
    
    import java.util.Map;
    
    String[] source= {"a", "b", "d", "c", "c", "e", "f", "c"};
    
    // Note the HashMap's "key" is a String and "value" is an int array 
    HashMap <String, int[] > hm = new HashMap<String, int[]>();
    
    void setup() {
      size(300, 300);
      background(0);
    
      // Putting key-value pairs in the HashMap
      setupArrays();
    
      // Using an enhanced loop to interate over each entry
      for (Map.Entry me : hm.entrySet()) {
        print(me.getKey() + " is ");
        printArray(me.getValue());
      }
    
      // We can also access values by their key
      int[] val = null;
      val=hm.get("c");
      if (val!=null) {
        println("\n==================\n now show c : ");
        printArray(val);
      }//if
    }//func
    
    void draw() {
      // 
      background(0);
      int columnNumber=0;
      for (char letter = 'a'; letter<='z'; letter++) {
        String letterAsString = trim(str(letter)); 
        // Exists?
        if (hm.containsKey(letterAsString)) {
          int[] val = null;
          val=hm.get(letterAsString);
          // Found? 
          if (val!=null) {
            fill(255); // white 
            text(letterAsString, 56+columnNumber*19, 50);
            fill(255, 0, 0); // red
            // loop over the array
            for (int i=0; i<val.length; i++) {
              // show values of the array under each other
              text(val[i], 56+columnNumber*19, 50+20+17*i);
            }//for
            columnNumber++;
          }// if
        }// if
      }//for
    }//func 
    
    void setupArrays() {
      int i=0;
      // looping over the entire source array
      for (String currentString : source) {
    
        // do we have the letter already?
        if (hm.containsKey(currentString)) {
          // yes, old one
          int [] temp = hm.get(currentString);
          temp = append(temp, i);
          hm.put(currentString, temp);
        }// 
        else {
          // No, new letter 
          int[] temp = new int [1];
          temp[0]=i;
          hm.put(currentString, temp);
        }//else 
    
        // increase index
        i++;
      }//for
    }//function
    //
    
  • edited April 2016

    this line is what you want to achieve:

    Gamble = 340, Ch. 1 - 120, Ch 2 - 220. Vrs: 142, 140, 3, 5, 66, [...]."

    now, I think Gamble you can do with the hashMap code above

    also 340 you can do

    please loop through the entire book first to achieve

    • Gamble

    • 340

    • all Verses (since The verse positions are for the entire book.)

    NOW Ch. 1 - 120, Ch 2 - 220

    now load Ch 1 only

    (A) and keep the data you have

    count

    then store the count result

    load Ch. 2 only

    repeat the above from (A)

    OOP

    to be able to store the data don't use an int[] as in my example but a CLASS instead (OOP)

    the class WordData holds all the data you want to collect for one word:

    String Word; // Gamble  
    
    int countEntireBook; // 340 
    int countChapter1;   // 120 
    int countChapter2;   // 220 
    
    int[] verses;      // 142, 140, 3, 5, 66  
    

    see on OOP

    https://www.processing.org/tutorials/objects/

  • //  https : // forum.processing.org/two/discussion/14188/finding-the-index-numbers-of-an-array-of-similiar-strings#latest
    
    //  list all the index values of the String array member, c (like {3,4,7})?
    
    import java.util.Map;
    
    String[] POEM = {"cat", "ball", "d", "c", "c", "e", "f", "c", "cat", "cat", "s", "ball", "cat"};
    
    // Note the HashMap's "key" is a String and "value" is a class
    HashMap <String, WordData > hm = new HashMap<String, WordData>();
    
    void setup() {
      size(900, 300);
      background(0);
    
      // Putting key-value pairs in the HashMap
      setupArrays();
    
      // Using an enhanced loop to iterate over each entry
      for (Map.Entry me : hm.entrySet()) {
        print(me.getKey() + " is ");
        WordData wd = (WordData) me.getValue(); 
        printArray(wd.wordAsString);
      }
    
      // We can also access values by their key
      WordData val = null;
    
      val=hm.get("c");
    
      if (val!=null) {
        print("\n==================\n now show c : ");
        println(val.countEntireBook);
      }//if
    }//func
    
    void draw() {
      // 
      background(0);
      int columnNumber=0;
      int factorX = 39; 
    
      for (Map.Entry me : hm.entrySet()) {
    
        WordData wd = (WordData) me.getValue(); 
    
        String letterAsString = trim((wd.wordAsString)); 
        // Exists?
        if (hm.containsKey(letterAsString)) {
    
          WordData val = null;
          val=hm.get(letterAsString);
    
          // Found? 
          if (val!=null) {
            // yes 
            // show the word in white
            fill(255); // white 
            text(letterAsString, 56+columnNumber*factorX, 50);
            // show the count in BLUE 
            fill(0, 0, 255); // blue 
            text(wd.countEntireBook, 56+columnNumber*factorX, 70);
            // show the positions in RED
            fill(255, 0, 0); // red
            // loop over the array
            for (int i=0; i<val.verses.length; i++) {
              // show values of the array under each other
              text(val.verses[i], 56+columnNumber*factorX, 50+20+40+(19*i));
            }//for
            columnNumber++;
          }// if
        }// if
      }//for
    }//func 
    
    void setupArrays() {
      int i=0;
      // looping over the entire POEM array
      for (String currentString : POEM) {
    
        // do we have the letter already?
        if (hm.containsKey(currentString)) {
          // yes, old one
          // we load the entire class       
          WordData temp = hm.get(currentString);
          // we change data in the class
          temp.countEntireBook++;
          temp.verses = append(temp.verses, i);
          // we put the updated class back in 
          hm.put(currentString, temp);
        }// 
        else {
          // No, new letter 
          WordData temp = new WordData();
          temp.wordAsString=currentString;
          temp.countEntireBook=1;
          temp.verses = append(temp.verses, i);
          hm.put(currentString, temp);
        }//else 
    
        // increase index
        i++;
      }//for
    }//function
    
    // =================================================================
    
    class WordData {
    
      // WordData holds all the data you want to collect for one word:
    
      String wordAsString=""; // Gamble  
    
      int countEntireBook; // 340 
      int countChapter1;   // 120 
      int countChapter2;   // 220 
    
      int[] verses = new int[0];      // 142, 140, 3, 5, 66
    
    
      WordData() {
      }
    }
    
    // 
    
  • Many thanks Chrisir, i'll update you. Good weekend.

  • Hi Chrisir!

    I'm back again to update you (if you are interested). I discussed with my supervisor and we changed some stuff to let them be easier to program.

    • I no more need to create a counter chapter 'cause we decided to split the chapters itself on each file txt.

    • So i only need a counter word to a-z (and this is done), and a counter verse, which should be a "counter lines(?)" (on this i'm still scraping my head and i can't find a good code).

    • This is a question: is this possible to export this list on a multi-pages PDF?

    If you want to see where i came, i'll send you the code.

    Best,

    FK

  • a counter verse, which should be a "counter lines(?)" (on this i'm still scraping my head and i can't find a good code).

    when you for loop over the lines (let's say with var i), isn't just the value of i the line number = verse number?

    export this list on a multi-pages PDF?

    this might get you started : https://www.processing.org/tutorials/print/

  • Sorry i had to be more precise: "counter verse" meaning that he have to say in which verse each word are located. Ex.: "The: 4, verse 4, 5, 6, 8."

    I saw that pdf guide, and i spotted pdfbox, but even that is not good. I guess i'm going to export it without that.

    Thanks anyway Chrisir.

  • Isn't one verse one line?

  • Answer ✓

    you can just write a text file, load it in MS Word, format it and save as pdf

    use saveStrings

  • edited June 2016

    Hi Chrisir, Great news, the code is almost finished. I'll put the project here as soon as possible. Just one thing: how can i delimits words into the same string delimiter? For ex.: String delimiters = " ,.?!§\';:`()[]-\""; I want to put words or letters separated from spaces into that. Is that possible? I tried "\/s/" didn't work. Same as *example (with commas)

  • Spaces?

    Did you try to put a space sign between ? and !

    just a ? !

  • String t1 = "Hello you kyoto";
    
    String[] arrayString=split(t1, " "); 
    
    printArray( arrayString); 
    

    ah you mean this: splitTokens :

    String t1 = "Hello you kyoto! What is on? Here";
    
    String[] arrayString = splitTokens(t1, "?! " ); 
    
    printArray( arrayString); 
    
  • edited June 2016 Answer ✓

    Hi Chrisir,

    I made a switch and used comparator. Thanks to a friend of mine, you can also change on SortMode 3 different type of sorting. Thanks anyway Chrisir.

    Edit: I know this is not what we are talking about (delimiters), i'm a bit out of mind, sorry.

    I can't delimit words with that process Chrisir. I'll have to import javax.xml.parsers.*; But it takes too much time and i'll have to rewrite some stuff. I'll show you sooner why i can't. Big thanks mate.

                                     // CHOOSE THREE DIFFERENT SORT TYPE
    int SortMode = 3;      //  1 OCCURRENCES SORT ---- 2 INTERNATIONAL ALPHABET SORT ---- 3 ITALIAN LATIN ALPHABET SORT
    
      switch(SortMode)
      {
        case 1:
          Collections.sort(list, MyStruct.Comparators.OCCORRENZE);
        break;
    
        case 2:
          Collections.sort(list, MyStruct.Comparators.PAROLA);
        break;
    
        case 3:
          Collections.sort(list);
        break;
    
        default:
          Collections.sort(list, MyStruct.Comparators.PAROLA);
          break;
      }  
    
    
    public static class MyStruct implements Comparable<MyStruct> 
    {
    
        private String parola;
        private String versi;
        private int occurrences;
        private int newline;
    
        Collator collator = Collator.getInstance(Locale.ITALY); // PUT THE REGION REGARDING THE TXT LANGUAGE - ACCENTS RECO *
        private final CollationKey key;
    
        public MyStruct(String parola, String versi, int occurrences, int newline)
        {
            this.parola = parola;
            this.versi = versi;
            this.occurrences = occurrences;
            this.newline = newline;
            this.key = collator.getCollationKey(parola);
        }
    
        public int compareTo(MyStruct oggetto) 
        {                
         return key.compareTo(oggetto.key);
        }
    
        public static class Comparators 
        {
    
          public static Comparator<MyStruct> PAROLA = new Comparator<MyStruct>() 
            {
    
                public int compare(MyStruct o1, MyStruct o2) 
                {
                    return o1.parola.compareTo(o2.parola);
                }
            };
    
    
            public static Comparator<MyStruct> OCCORRENZE = new Comparator<MyStruct>() 
            {
    
                public int compare(MyStruct o1, MyStruct o2) 
                {
                    return o1.occurrences - o2.occurrences;
                }
            };
    
        }
    }
    
  • edited June 2016

    Hi all, here we are with the first type of generator.

     //  SPECIAL THANKS AND SUPPORT - Alessandro Vissani - Davide Riboli
     //  DON'T BE SCARED HAVE FUN WITH (SOME) PARAMETERS 
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.OutputStreamWriter;
    import java.io.Writer;
    import java.util.Collections;
    import java.util.Collection;
    import java.util.Iterator;
    import java.util.ArrayList;
    import java.util.Comparator;
    import java.util.List;
    import java.util.*;
    
    import processing.pdf.*;
    import java.text.*;
    import java.io.Serializable;
    import java.text.CollationKey;
    import java.text.Collator;
    import java.util.Locale;
    import java.text.*;
    import java.util.Scanner;
    import java.util.Locale;
    
    import java.util.*;
    import java.io.*;
    import javax.xml.parsers.*;
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    
    
    
    //  OLD VAR CODE 
    String lines[];
    String[] allwords;    // THIS ARRAY HOLDS ALL THE WORDS
    
    //  VAR FONT
    PFont font;
    
    //  DELIMITER - ADD OR ELIMINATE ALL YOU WANT - FOR NOW YOU CANNOT DELIMIT WORDS
    String delimiters = " ,.?=#@°!&%£^$_§\';:*`()[]-\""; // DO NOT USE "/" AS DELIMITER
    
    IntDict concordance;
    
    boolean savePDF = false;
    PImage img;
    color[] colors;
    
                            // CHOOSE THREE DIFFERENT SORT TYPE
    int SortMode = 3;      //  1 OCCURRENCES SORT ---- 2 INTERNATIONAL ALPHABET SORT ---- 3 ITALIAN LATIN ALPHABET SORT
    
    
    //****************************//
    // LET'S BEGING - START SETUP //
    //****************************//
    
    
    void setup()
    {
      size(440, 737, PDF, "type_name_output.pdf");   //  SIZE PAGE - NAME PDF FILE - CARE IT WILL OVERRIDE THE OLDER ONE - CHANGE NAME BEFORE RUNNING            
    
     ArrayList<MyStruct> list = new ArrayList<MyStruct>();
     list.add(new MyStruct("","",0, 3));
    
      //********************************************************
      // LOAD FILE TXT INTO ARRAY STRING - WRITE HERE THE FILE *
      //********************************************************
      String url2 = "type_name_input.txt";         //  CHANGE FILE HERE - THE FILE MUST BE UTF-8
      String[] rawtext2 = loadStrings(url2);
    
      //*****************************
      // CREATE FILE TEMP ***********
      //*****************************
      Writer writer = null;
      try 
      {
        writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("filetemp.txt"), "utf-8"));
    
        for (int i = 0; i < rawtext2.length; i++) 
        {
          writer.write(rawtext2[i]);
          writer.write(" / ");
        }
        writer.close();
      }
      catch(IOException e)
      {
      }
    
      String url = "filetemp.txt";
      String[] rawtext = loadStrings(url);
    
      // Join the big array together as one long string.
      // ***********************************************
      String everything = join(rawtext, "" );
    
      // Note the use of splitTokens() since we are using spaces and punctuation marks all as delimiters.  
      // ************************************************************************************************
      allwords = splitTokens(everything, delimiters);
    
      // Make a new empty dictionary.
      // ****************************
      concordance = new IntDict();
    
    
      for (int i = 0; i < allwords.length; i++) 
      {
        String s = allwords[i].toLowerCase();
        concordance.increment(s);
      }
    
    
      //*************************
      //* **** MEGA OBJECT **** *
      //*************************
    
      String word = "";
      int dimlist = 1;
      boolean found = false;
      int verse = 1;  // THIS VARIABLE NEED TO BE SETTLED BASED ON THE ORIGINAL TEXT LINE
    
      //***********************************
      // SEARCHING WORDS, STORING FUNCT   *
      //***********************************
      for (int i = 0; i < allwords.length; i++) 
      {
        word = allwords[i].toLowerCase(); // MAKES ALL WORDS LOWERCASE TO NOT CREATE DUPLICATES BETWEEN SAME WORDS: "Hello" "hello" on the same counter
                                          // ANYWAY YOU CAN CHANGE IT TO TRY NEW INSIGHTS ON SENSITIVE CASES
    
        if(word.equals("/") == true)
        {
          verse ++;
        }
    
        found = false; 
    
        for (int g = 0; g < dimlist; g++)
        {
          if(list.get(g).parola.equals(word) == true)  //  IT CHECKS IF WORD IS PRESENT INTO THE ARRAYLIST
          {                                                                        
            found = true; 
    
            list.get(g).occurrences ++;
    
            list.get(g).versi = list.get(g).versi + ", ";
    
            list.get(g).versi = list.get(g).versi + String.valueOf(verse);
    
          }
    
        }
    
        if(found == false)
        {
          list.add(new MyStruct(word,String.valueOf(verse),1, 3));
          dimlist ++ ;
        }  
    
      }
    
      //***********************//
      // ALPHABET SWITCH FUNCT //
      //*********************//
    
      switch(SortMode)
      {
        case 1:
          Collections.sort(list, MyStruct.Comparators.OCCORRENZE); // OCCURRENCES
        break;
    
        case 2:
          Collections.sort(list, MyStruct.Comparators.PAROLA); // WORD
        break;
    
        case 3:
          Collections.sort(list);
        break;
    
        default:
          Collections.sort(list, MyStruct.Comparators.PAROLA); // WORD
          break;
      }  
    
    
      //*****************************//
      // GRAPHIC INFOS - PRINT PDF  //
      //****************************//
    
      PGraphicsPDF pdf = (PGraphicsPDF) g;  // GETTING THE RENDER PDF
    
      font = createFont ("Titillium-Regular.otf", 15);
      textFont(font);
    
      pdf.background(255);          // #1 PAPER COLOR FOR THE FIRST PAGE - SEE BEYOND FOR NEXT PAGES
      pdf.fill(60, 60, 60);           // FONT COLOR
      pdf.textFont(font, 11);
      pdf.textSize(11);             // FONT SIZE
    
      String row;
      String pages;
      int counterrow;
    
    
      pages= "";
      row = "";       // LINES OR ROW
      counterrow=0;   // LINES LIMITER - DON'T CHANGE IT HERE - BUT BEYOND
    
    
      //******************************
      // COUNTER LINES FUNCT (VERSI) *
      //******************************
    
          for(int h = 2; h < (dimlist-1) ; h ++)
          {
             if((list.get(h).parola.equals("/") == false) && (list.get(h).parola.equals("") == false) && (list.get(h).parola.equals("?") == false))
             {
                 row = list.get(h).parola + ": " + list.get(h).occurrences + "; vers.  " + list.get(h).versi + "." + "\n" + "\n";  // WHAT HE MUST WRITE ON THE PDF
                 pages = pages + row; 
    
                 if(counterrow >= 17)              //SETUP HERE HOW MANY LINES SHOULD BE PRINT ON PDF PER PAGE//
                 {
    
                   //rect(40, 60, 300, 630);      // CAGE VISIBILE OR NOT
                   //stroke(1);
                   pdf.background(255);            // #2 PAPER COLOR FOR EACH NEW PAGE - USUALLY SETTED WITH THE SAME VALUE ABOVE (#1 BACKGROUND)
                   noFill();
                   text(pages, 40, 60, 250, 1000);  // CAGE TEXT - POSITION X-Y | WIDTH LENGHT X-Y
                   counterrow = 0;
                   pdf.nextPage();
                   pages = "";
                 }
    
                 counterrow++;
    
             } 
          }
          noFill();                         // THIS IS THE LAST PAGE
          text(pages, 40, 60, 250, 1000);  // CAGE TEXT - POSITION X-Y | WIDTH LENGHT X-Y
    
    
      println("I'm done. Please check your PDF located on the sketch folder."); // DONE MESSAGE - FILE READY!!!
    
    }//SETUP FINISHED
    
    
    
    //*******************************
    //OBJECT WORD - LINES - COUNTER *
    //*******************************
    public static class MyStruct implements Comparable<MyStruct> 
    {
    
        private String parola;
        private String versi;
        private int occurrences;
        private int newline;
    
        Collator collator = Collator.getInstance(Locale.ITALY); // PUT THE REGION REGARDING THE TXT LANGUAGE - ACCENTS RECO *
        private final CollationKey key;
    
        public MyStruct(String parola, String versi, int occurrences, int newline)
        {
            this.parola = parola;
            this.versi = versi;
            this.occurrences = occurrences;
            this.newline = newline;
            this.key = collator.getCollationKey(parola);
        }
    
        public int compareTo(MyStruct oggetto) 
        {                
         return key.compareTo(oggetto.key);
        }
    
        public static class Comparators 
        {
    
          public static Comparator<MyStruct> PAROLA = new Comparator<MyStruct>() 
            {
    
                public int compare(MyStruct o1, MyStruct o2) 
                {
                    return o1.parola.compareTo(o2.parola);
                }
            };
    
    
            public static Comparator<MyStruct> OCCORRENZE = new Comparator<MyStruct>() 
            {
    
                public int compare(MyStruct o1, MyStruct o2) 
                {
                    return o1.occurrences - o2.occurrences;
                }
            };
    
        }
    }
    
  • Well done!

  • edited June 2016

    Well sorry for not explaining the whole project. I need some time to translate the sheet. Anyway thanks for the support ill quote you Chrisir.

    edit: i know, it is not well programmed, too many switches, but being novice with code, i guess its not bad.

Sign In or Register to comment.