I am unsure how to parse a text file containing words and frequency of use into an array

ordeayn · August 2015

"non 145637 di 129302 che 128567 è 105309 e 99519 la 92752 il 83487 un 81762 a 78680 per 59380 in 48533 una 48229 mi 46947 sono 45347 ho 39064 ma 36205 l' 35870 lo 35124 ha 34790 le 34746 si 33241 ti 32019..." is an extract of content from the text file. I want to get it into a format associating the number with the word preceding it.. There are about 500 000 words in the text file representing the frequency of occurrence ( taken from Wikipedia). I am very new to Processing so apologise for not even having attempted a Processing script. Its the logic to use to separately identify alpha and numeric that I am not sure about.

GoToLoop · August 2015

https://Processing.org/reference/loadStrings_.html
https://Processing.org/reference/splitTokens_.html

Chrisir · August 2015

when there is a line break after each pair (non 145637) you are almost there

step 1

    String lines[] = loadStrings("list.txt");

    println("there are " + lines.length + " lines");

    for (int i = 0 ; i < lines.length; i++) {
      println(lines[i]);
    }

what does it give you? Your textfile must be named list.txt (or change the name list.txt in the sketch / code)

step 2

now with split to divide each line into 2 parts

size (1000, 600);

int max = 550000; 

String[] wordsList = new String [max];
int[]    frequencyList = new int  [max]; 

String lines[] = loadStrings("list.txt");

println("there are " + lines.length + " lines");

println("---------------------");


for (int i = 0; i < lines.length; i++) {
  print(lines[i] + " -> ");

  String[] temp = splitTokens(lines[i]);
  print(temp[0]+ " - ");  // Prints 
  println(temp[1]);  // Prints 

  wordsList[i] = trim(temp[0]);
  frequencyList[i] = int(trim(temp[1]));
}

println("---------------------");

// show result 
// upper bound is lines.length which is < max 
for (int i = 0; i < lines.length; i++) {
  print(wordsList[i] + " : ");  // Prints 
  println(frequencyList[i]);  // Prints
}
//

;-)

ordeayn · August 2015

Thank you so much for your help. Will study the method that you have shown me to get to understand better. Now need to display results to the sketch window.

Chrisir · August 2015

  // show graphical
  text(wordsList[i], 20, i*20+29);
  line( 120, i*20+29, 
  120+frequencyList[i]/500, i*20+29 );

ordeayn · August 2015

Thanks very much for your help. Much appreciated

Chrisir · August 2015

there are different ways to do this

you could scale the size of the words depending on their freq
you could go 3D
you could have vertical lines
you could place them in a circle
you could have mouse over effect that displays the number of freq in a small rectangle

Chrisir · August 2015

end of the sketch

text("Word frequency", width-111, height-322);
text("scale 1:500", width-111, height-299);
println("done ---------------------");

Chrisir · August 2015

use colors with fill() and thicker bars with rect() instead of my lines

Chrisir · August 2015

mouse you need setup and draw

When you count your own words use hashMap or so iirc

Howdy, Stranger!

Categories

In this Discussion

I am unsure how to parse a text file containing words and frequency of use into an array

Best Answers

Answers