Calculate frequency of word usage in text
in
Programming Questions
•
2 years ago
Hello Forum
I am trying to create a program that will read a text file, and produce an output on the frequency of the words used.
There are many examples of this around, Daniel Shiffman uses one in his book, searching through King Lear.
However, what I am NOT trying to do is simply count how many times a word appears as the program loops through the text.
Instead I am trying to produce a series of values to which many words may appear.
For example, the word 'the' and the word 'and' may both be the most frequently used, and arguments sake both appear 200 times in a passage of text.
What I am trying to do is produce this figure 200 (just once, even though there are two words appearing this many times).
Where I am at them moment
I have successfully created a string which pulls in a text file, and split it up into single words (Thank you Mr Shiffman!)
This outputs each word as a single part of the array.
The problem I have is trying to count through the array, and produce a total of word frequencies, as I described above.
If any body could give me a pointer here, I would be v grateful! It's driving me nuts!
The code
A
I am trying to create a program that will read a text file, and produce an output on the frequency of the words used.
There are many examples of this around, Daniel Shiffman uses one in his book, searching through King Lear.
However, what I am NOT trying to do is simply count how many times a word appears as the program loops through the text.
Instead I am trying to produce a series of values to which many words may appear.
For example, the word 'the' and the word 'and' may both be the most frequently used, and arguments sake both appear 200 times in a passage of text.
What I am trying to do is produce this figure 200 (just once, even though there are two words appearing this many times).
Where I am at them moment
I have successfully created a string which pulls in a text file, and split it up into single words (Thank you Mr Shiffman!)
This outputs each word as a single part of the array.
The problem I have is trying to count through the array, and produce a total of word frequencies, as I described above.
If any body could give me a pointer here, I would be v grateful! It's driving me nuts!
The code
- //Declare inital vars
PFont fontA;
String[] myText; // The array to hold all of the text
int counter = 1; // Where are we in the text
String delimiters = " ,.?!;:[]";
int y = 10;
int x = 10;
int spacing = 20;
void setup() {
size(1200,500);
background(255);
smooth();
noLoop();
fontA = loadFont("Gotham-Medium-16.vlw");
textFont(fontA, 16);
// Load the text file into an array
String rawText[] = loadStrings ("data/Ruth.txt");
// Join all the text into one long string
String everything = join(rawText, "" );
// Now make the declared array called myText,
// to include all text as single words and remove delimiters
myText = splitTokens(everything,delimiters);
}
void draw() {
fill(0);
// Loop through array, word by word
for (int i = 0; i < myText.length; i++)
{
//Display the text and data
y = y + spacing;
text(i +" My array value = " +myText[i],x,y);
//Add magic to calculate word frequency
}
//Print out to console
println (myText);
}
A
1