Concordance_txt_loadString_

FromKyoto · April 2016

Hi all,

I'm new with Processing but to learn it i wanted to do it through thesis project[tough guy]. The highlights of this one is creating a concordance book from a famous epic poem (open source too). [this poem is on txt file]

So, for every word the code should say: "count total word", "count on each chapter word" and the "verse positions".

Ex.= "Gamble = 340, Ch. 1 - 120, Ch 2 - 220. Vrs: 142, 140, 3, 5, 66, [...]."

I saw the loadString and ok, but what i don't understand is that can i do multi-loadStrings for each chapter to do a sort of delimiter for counter chapter?

First idea was to take the txt file and divide it for each chapter: Is that a right way do it?

I saw the lessons of Daniel on Youtube, very helpful [Thanks] and i bought all the books. The problem is i don't know well where to start. I'm sorry for disturbing. Grateful for your help.

Chrisir · April 2016

have you seen this

https://forum.processing.org/two/discussion/15883/word-counter-question#latest

Chrisir · April 2016

are the verse positions also per chapter or for the entire book?

which book is it?

is the txt file for the entire book or one txt file per chapter?

when it is one txt for the entire book obviously load the entire a book and split it up into chapters,

Have an array textsOfTheChapters[] where you store the texts

FromKyoto · April 2016

Hi Mr Chrisir. Thanks for the respond. And sorry for the bad grammar, i'm still learning the english.

Well. Starting with the first question: no haven't seen it yet. I will right now.

The verse positions are for the entire book.

The book is The Divine Comedy by Dante Alighieri.

I have both of them.

Ok, i'll do it. Thanks.

Chrisir · April 2016

Are the txt files like

Vers Number, text ?

Then detect the comma an split i there

maybe you can use split() if it is

","

FromKyoto · April 2016

I don't get it well, but:

•I have 3 files txt corresponding each chapter: Hell, Paradise and Purgatory. [i split it in my own] •Then i have the unique [original] file that contains the whole text book. •Some line doesn't finish with comma [if i understand well]

To simplify the question here it is an example:

Capture

Chrisir · April 2016

Thank you.

Ok, no line Numbers in the txt file

FromKyoto · April 2016

So i guess the only way is to transform it in xml. Right?

Chrisir · April 2016

No xml is not required here

you make backup copies of all 4 txt files first

use hashmap to monitor which words you already have

Chrisir · April 2016

here is an example - not by me

Chrisir · April 2016

//  https : // forum.processing.org/two/discussion/14188/finding-the-index-numbers-of-an-array-of-similiar-strings#latest

//  list all the index values of the String array member, c (like {3,4,7})?

import java.util.Map;

String[] source= {"a", "b", "d", "c", "c", "e", "f", "c"};

// Note the HashMap's "key" is a String and "value" is an int array 
HashMap <String, int[] > hm = new HashMap<String, int[]>();

void setup() {
  size(300, 300);
  background(0);

  // Putting key-value pairs in the HashMap
  setupArrays();

  // Using an enhanced loop to interate over each entry
  for (Map.Entry me : hm.entrySet()) {
    print(me.getKey() + " is ");
    printArray(me.getValue());
  }

  // We can also access values by their key
  int[] val = null;
  val=hm.get("c");
  if (val!=null) {
    println("\n==================\n now show c : ");
    printArray(val);
  }//if
}//func

void draw() {
  // 
  background(0);
  int columnNumber=0;
  for (char letter = 'a'; letter<='z'; letter++) {
    String letterAsString = trim(str(letter)); 
    // Exists?
    if (hm.containsKey(letterAsString)) {
      int[] val = null;
      val=hm.get(letterAsString);
      // Found? 
      if (val!=null) {
        fill(255); // white 
        text(letterAsString, 56+columnNumber*19, 50);
        fill(255, 0, 0); // red
        // loop over the array
        for (int i=0; i<val.length; i++) {
          // show values of the array under each other
          text(val[i], 56+columnNumber*19, 50+20+17*i);
        }//for
        columnNumber++;
      }// if
    }// if
  }//for
}//func 

void setupArrays() {
  int i=0;
  // looping over the entire source array
  for (String currentString : source) {

    // do we have the letter already?
    if (hm.containsKey(currentString)) {
      // yes, old one
      int [] temp = hm.get(currentString);
      temp = append(temp, i);
      hm.put(currentString, temp);
    }// 
    else {
      // No, new letter 
      int[] temp = new int [1];
      temp[0]=i;
      hm.put(currentString, temp);
    }//else 

    // increase index
    i++;
  }//for
}//function
//

Chrisir · April 2016

this line is what you want to achieve:

Gamble = 340, Ch. 1 - 120, Ch 2 - 220. Vrs: 142, 140, 3, 5, 66, [...]."

now, I think Gamble you can do with the hashMap code above

also 340 you can do

please loop through the entire book first to achieve

Gamble
340
all Verses (since The verse positions are for the entire book.)

NOW Ch. 1 - 120, Ch 2 - 220

now load Ch 1 only

(A) and keep the data you have

count

then store the count result

load Ch. 2 only

repeat the above from (A)

OOP

to be able to store the data don't use an int[] as in my example but a CLASS instead (OOP)

the class WordData holds all the data you want to collect for one word:

String Word; // Gamble  

int countEntireBook; // 340 
int countChapter1;   // 120 
int countChapter2;   // 220 

int[] verses;      // 142, 140, 3, 5, 66

see on OOP

https://www.processing.org/tutorials/objects/

Chrisir · April 2016

//  https : // forum.processing.org/two/discussion/14188/finding-the-index-numbers-of-an-array-of-similiar-strings#latest

//  list all the index values of the String array member, c (like {3,4,7})?

import java.util.Map;

String[] POEM = {"cat", "ball", "d", "c", "c", "e", "f", "c", "cat", "cat", "s", "ball", "cat"};

// Note the HashMap's "key" is a String and "value" is a class
HashMap <String, WordData > hm = new HashMap<String, WordData>();

void setup() {
  size(900, 300);
  background(0);

  // Putting key-value pairs in the HashMap
  setupArrays();

  // Using an enhanced loop to iterate over each entry
  for (Map.Entry me : hm.entrySet()) {
    print(me.getKey() + " is ");
    WordData wd = (WordData) me.getValue(); 
    printArray(wd.wordAsString);
  }

  // We can also access values by their key
  WordData val = null;

  val=hm.get("c");

  if (val!=null) {
    print("\n==================\n now show c : ");
    println(val.countEntireBook);
  }//if
}//func

void draw() {
  // 
  background(0);
  int columnNumber=0;
  int factorX = 39; 

  for (Map.Entry me : hm.entrySet()) {

    WordData wd = (WordData) me.getValue(); 

    String letterAsString = trim((wd.wordAsString)); 
    // Exists?
    if (hm.containsKey(letterAsString)) {

      WordData val = null;
      val=hm.get(letterAsString);

      // Found? 
      if (val!=null) {
        // yes 
        // show the word in white
        fill(255); // white 
        text(letterAsString, 56+columnNumber*factorX, 50);
        // show the count in BLUE 
        fill(0, 0, 255); // blue 
        text(wd.countEntireBook, 56+columnNumber*factorX, 70);
        // show the positions in RED
        fill(255, 0, 0); // red
        // loop over the array
        for (int i=0; i<val.verses.length; i++) {
          // show values of the array under each other
          text(val.verses[i], 56+columnNumber*factorX, 50+20+40+(19*i));
        }//for
        columnNumber++;
      }// if
    }// if
  }//for
}//func 

void setupArrays() {
  int i=0;
  // looping over the entire POEM array
  for (String currentString : POEM) {

    // do we have the letter already?
    if (hm.containsKey(currentString)) {
      // yes, old one
      // we load the entire class       
      WordData temp = hm.get(currentString);
      // we change data in the class
      temp.countEntireBook++;
      temp.verses = append(temp.verses, i);
      // we put the updated class back in 
      hm.put(currentString, temp);
    }// 
    else {
      // No, new letter 
      WordData temp = new WordData();
      temp.wordAsString=currentString;
      temp.countEntireBook=1;
      temp.verses = append(temp.verses, i);
      hm.put(currentString, temp);
    }//else 

    // increase index
    i++;
  }//for
}//function

// =================================================================

class WordData {

  // WordData holds all the data you want to collect for one word:

  String wordAsString=""; // Gamble  

  int countEntireBook; // 340 
  int countChapter1;   // 120 
  int countChapter2;   // 220 

  int[] verses = new int[0];      // 142, 140, 3, 5, 66


  WordData() {
  }
}

//

FromKyoto · April 2016

Many thanks Chrisir, i'll update you. Good weekend.

FromKyoto · April 2016

Hi Chrisir!

I'm back again to update you (if you are interested). I discussed with my supervisor and we changed some stuff to let them be easier to program.

I no more need to create a counter chapter 'cause we decided to split the chapters itself on each file txt.
So i only need a counter word to a-z (and this is done), and a counter verse, which should be a "counter lines(?)" (on this i'm still scraping my head and i can't find a good code).
This is a question: is this possible to export this list on a multi-pages PDF?

If you want to see where i came, i'll send you the code.

Best,

FK

Chrisir · April 2016

a counter verse, which should be a "counter lines(?)" (on this i'm still scraping my head and i can't find a good code).

when you for loop over the lines (let's say with var i), isn't just the value of i the line number = verse number?

export this list on a multi-pages PDF?

this might get you started : https://www.processing.org/tutorials/print/

FromKyoto · May 2016

Sorry i had to be more precise: "counter verse" meaning that he have to say in which verse each word are located. Ex.: "The: 4, verse 4, 5, 6, 8."

I saw that pdf guide, and i spotted pdfbox, but even that is not good. I guess i'm going to export it without that.

Thanks anyway Chrisir.

Chrisir · May 2016

Isn't one verse one line?

Chrisir · May 2016

you can just write a text file, load it in MS Word, format it and save as pdf

use saveStrings

FromKyoto · May 2016

Hi Chrisir, Great news, the code is almost finished. I'll put the project here as soon as possible. Just one thing: how can i delimits words into the same string delimiter? For ex.: String delimiters = " ,.?!§\';:`()[]-\""; I want to put words or letters separated from spaces into that. Is that possible? I tried "\/s/" didn't work. Same as *example (with commas)

Chrisir · May 2016

Spaces?

Did you try to put a space sign between ? and !

just a ? !

Chrisir · May 2016

String t1 = "Hello you kyoto";

String[] arrayString=split(t1, " "); 

printArray( arrayString);

ah you mean this: splitTokens :

String t1 = "Hello you kyoto! What is on? Here";

String[] arrayString = splitTokens(t1, "?! " ); 

printArray( arrayString);

FromKyoto · June 2016

Hi Chrisir,

I made a switch and used comparator. Thanks to a friend of mine, you can also change on SortMode 3 different type of sorting. Thanks anyway Chrisir.

Edit: I know this is not what we are talking about (delimiters), i'm a bit out of mind, sorry.

I can't delimit words with that process Chrisir. I'll have to import javax.xml.parsers.*; But it takes too much time and i'll have to rewrite some stuff. I'll show you sooner why i can't. Big thanks mate.

                                 // CHOOSE THREE DIFFERENT SORT TYPE
int SortMode = 3;      //  1 OCCURRENCES SORT ---- 2 INTERNATIONAL ALPHABET SORT ---- 3 ITALIAN LATIN ALPHABET SORT

  switch(SortMode)
  {
    case 1:
      Collections.sort(list, MyStruct.Comparators.OCCORRENZE);
    break;

    case 2:
      Collections.sort(list, MyStruct.Comparators.PAROLA);
    break;

    case 3:
      Collections.sort(list);
    break;

    default:
      Collections.sort(list, MyStruct.Comparators.PAROLA);
      break;
  }  


public static class MyStruct implements Comparable<MyStruct> 
{

    private String parola;
    private String versi;
    private int occurrences;
    private int newline;

    Collator collator = Collator.getInstance(Locale.ITALY); // PUT THE REGION REGARDING THE TXT LANGUAGE - ACCENTS RECO *
    private final CollationKey key;

    public MyStruct(String parola, String versi, int occurrences, int newline)
    {
        this.parola = parola;
        this.versi = versi;
        this.occurrences = occurrences;
        this.newline = newline;
        this.key = collator.getCollationKey(parola);
    }

    public int compareTo(MyStruct oggetto) 
    {                
     return key.compareTo(oggetto.key);
    }

    public static class Comparators 
    {

      public static Comparator<MyStruct> PAROLA = new Comparator<MyStruct>() 
        {

            public int compare(MyStruct o1, MyStruct o2) 
            {
                return o1.parola.compareTo(o2.parola);
            }
        };


        public static Comparator<MyStruct> OCCORRENZE = new Comparator<MyStruct>() 
        {

            public int compare(MyStruct o1, MyStruct o2) 
            {
                return o1.occurrences - o2.occurrences;
            }
        };

    }
}

FromKyoto · June 2016

Hi all, here we are with the first type of generator.

 //  SPECIAL THANKS AND SUPPORT - Alessandro Vissani - Davide Riboli
 //  DON'T BE SCARED HAVE FUN WITH (SOME) PARAMETERS 
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Collections;
import java.util.Collection;
import java.util.Iterator;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.*;

import processing.pdf.*;
import java.text.*;
import java.io.Serializable;
import java.text.CollationKey;
import java.text.Collator;
import java.util.Locale;
import java.text.*;
import java.util.Scanner;
import java.util.Locale;

import java.util.*;
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;



//  OLD VAR CODE 
String lines[];
String[] allwords;    // THIS ARRAY HOLDS ALL THE WORDS

//  VAR FONT
PFont font;

//  DELIMITER - ADD OR ELIMINATE ALL YOU WANT - FOR NOW YOU CANNOT DELIMIT WORDS
String delimiters = " ,.?=#@°!&%£^$_§\';:*`()[]-\""; // DO NOT USE "/" AS DELIMITER

IntDict concordance;

boolean savePDF = false;
PImage img;
color[] colors;

                        // CHOOSE THREE DIFFERENT SORT TYPE
int SortMode = 3;      //  1 OCCURRENCES SORT ---- 2 INTERNATIONAL ALPHABET SORT ---- 3 ITALIAN LATIN ALPHABET SORT


//****************************//
// LET'S BEGING - START SETUP //
//****************************//


void setup()
{
  size(440, 737, PDF, "type_name_output.pdf");   //  SIZE PAGE - NAME PDF FILE - CARE IT WILL OVERRIDE THE OLDER ONE - CHANGE NAME BEFORE RUNNING            

 ArrayList<MyStruct> list = new ArrayList<MyStruct>();
 list.add(new MyStruct("","",0, 3));

  //********************************************************
  // LOAD FILE TXT INTO ARRAY STRING - WRITE HERE THE FILE *
  //********************************************************
  String url2 = "type_name_input.txt";         //  CHANGE FILE HERE - THE FILE MUST BE UTF-8
  String[] rawtext2 = loadStrings(url2);

  //*****************************
  // CREATE FILE TEMP ***********
  //*****************************
  Writer writer = null;
  try 
  {
    writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("filetemp.txt"), "utf-8"));

    for (int i = 0; i < rawtext2.length; i++) 
    {
      writer.write(rawtext2[i]);
      writer.write(" / ");
    }
    writer.close();
  }
  catch(IOException e)
  {
  }

  String url = "filetemp.txt";
  String[] rawtext = loadStrings(url);

  // Join the big array together as one long string.
  // ***********************************************
  String everything = join(rawtext, "" );

  // Note the use of splitTokens() since we are using spaces and punctuation marks all as delimiters.  
  // ************************************************************************************************
  allwords = splitTokens(everything, delimiters);

  // Make a new empty dictionary.
  // ****************************
  concordance = new IntDict();


  for (int i = 0; i < allwords.length; i++) 
  {
    String s = allwords[i].toLowerCase();
    concordance.increment(s);
  }


  //*************************
  //* **** MEGA OBJECT **** *
  //*************************

  String word = "";
  int dimlist = 1;
  boolean found = false;
  int verse = 1;  // THIS VARIABLE NEED TO BE SETTLED BASED ON THE ORIGINAL TEXT LINE

  //***********************************
  // SEARCHING WORDS, STORING FUNCT   *
  //***********************************
  for (int i = 0; i < allwords.length; i++) 
  {
    word = allwords[i].toLowerCase(); // MAKES ALL WORDS LOWERCASE TO NOT CREATE DUPLICATES BETWEEN SAME WORDS: "Hello" "hello" on the same counter
                                      // ANYWAY YOU CAN CHANGE IT TO TRY NEW INSIGHTS ON SENSITIVE CASES

    if(word.equals("/") == true)
    {
      verse ++;
    }

    found = false; 

    for (int g = 0; g < dimlist; g++)
    {
      if(list.get(g).parola.equals(word) == true)  //  IT CHECKS IF WORD IS PRESENT INTO THE ARRAYLIST
      {                                                                        
        found = true; 

        list.get(g).occurrences ++;

        list.get(g).versi = list.get(g).versi + ", ";

        list.get(g).versi = list.get(g).versi + String.valueOf(verse);

      }

    }

    if(found == false)
    {
      list.add(new MyStruct(word,String.valueOf(verse),1, 3));
      dimlist ++ ;
    }  

  }

  //***********************//
  // ALPHABET SWITCH FUNCT //
  //*********************//

  switch(SortMode)
  {
    case 1:
      Collections.sort(list, MyStruct.Comparators.OCCORRENZE); // OCCURRENCES
    break;

    case 2:
      Collections.sort(list, MyStruct.Comparators.PAROLA); // WORD
    break;

    case 3:
      Collections.sort(list);
    break;

    default:
      Collections.sort(list, MyStruct.Comparators.PAROLA); // WORD
      break;
  }  


  //*****************************//
  // GRAPHIC INFOS - PRINT PDF  //
  //****************************//

  PGraphicsPDF pdf = (PGraphicsPDF) g;  // GETTING THE RENDER PDF

  font = createFont ("Titillium-Regular.otf", 15);
  textFont(font);

  pdf.background(255);          // #1 PAPER COLOR FOR THE FIRST PAGE - SEE BEYOND FOR NEXT PAGES
  pdf.fill(60, 60, 60);           // FONT COLOR
  pdf.textFont(font, 11);
  pdf.textSize(11);             // FONT SIZE

  String row;
  String pages;
  int counterrow;


  pages= "";
  row = "";       // LINES OR ROW
  counterrow=0;   // LINES LIMITER - DON'T CHANGE IT HERE - BUT BEYOND


  //******************************
  // COUNTER LINES FUNCT (VERSI) *
  //******************************

      for(int h = 2; h < (dimlist-1) ; h ++)
      {
         if((list.get(h).parola.equals("/") == false) && (list.get(h).parola.equals("") == false) && (list.get(h).parola.equals("?") == false))
         {
             row = list.get(h).parola + ": " + list.get(h).occurrences + "; vers.  " + list.get(h).versi + "." + "\n" + "\n";  // WHAT HE MUST WRITE ON THE PDF
             pages = pages + row; 

             if(counterrow >= 17)              //SETUP HERE HOW MANY LINES SHOULD BE PRINT ON PDF PER PAGE//
             {

               //rect(40, 60, 300, 630);      // CAGE VISIBILE OR NOT
               //stroke(1);
               pdf.background(255);            // #2 PAPER COLOR FOR EACH NEW PAGE - USUALLY SETTED WITH THE SAME VALUE ABOVE (#1 BACKGROUND)
               noFill();
               text(pages, 40, 60, 250, 1000);  // CAGE TEXT - POSITION X-Y | WIDTH LENGHT X-Y
               counterrow = 0;
               pdf.nextPage();
               pages = "";
             }

             counterrow++;

         } 
      }
      noFill();                         // THIS IS THE LAST PAGE
      text(pages, 40, 60, 250, 1000);  // CAGE TEXT - POSITION X-Y | WIDTH LENGHT X-Y


  println("I'm done. Please check your PDF located on the sketch folder."); // DONE MESSAGE - FILE READY!!!

}//SETUP FINISHED



//*******************************
//OBJECT WORD - LINES - COUNTER *
//*******************************
public static class MyStruct implements Comparable<MyStruct> 
{

    private String parola;
    private String versi;
    private int occurrences;
    private int newline;

    Collator collator = Collator.getInstance(Locale.ITALY); // PUT THE REGION REGARDING THE TXT LANGUAGE - ACCENTS RECO *
    private final CollationKey key;

    public MyStruct(String parola, String versi, int occurrences, int newline)
    {
        this.parola = parola;
        this.versi = versi;
        this.occurrences = occurrences;
        this.newline = newline;
        this.key = collator.getCollationKey(parola);
    }

    public int compareTo(MyStruct oggetto) 
    {                
     return key.compareTo(oggetto.key);
    }

    public static class Comparators 
    {

      public static Comparator<MyStruct> PAROLA = new Comparator<MyStruct>() 
        {

            public int compare(MyStruct o1, MyStruct o2) 
            {
                return o1.parola.compareTo(o2.parola);
            }
        };


        public static Comparator<MyStruct> OCCORRENZE = new Comparator<MyStruct>() 
        {

            public int compare(MyStruct o1, MyStruct o2) 
            {
                return o1.occurrences - o2.occurrences;
            }
        };

    }
}

Chrisir · June 2016

Well done!

FromKyoto · June 2016

Well sorry for not explaining the whole project. I need some time to translate the sheet. Anyway thanks for the support ill quote you Chrisir.

edit: i know, it is not well programmed, too many switches, but being novice with code, i guess its not bad.

Howdy, Stranger!

Categories

In this Discussion

Concordance_txt_loadString_

Best Answers

Answers