We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Hello, I'd like to hear your thoughts on this problem I have with this game. I'm working on a word-game called Boggle. It reads a dictionary file that contains over 200.000 words and then checks if the current board (with letters generated with a specific frequency according to the language) has any of those words. Generating the letters and reading the strings from the file is not a problem. But when I try to check if the words are on the board it takes over 5secs to check which words are possible. Now is my question if there are any faster ways to process this data?
Declarations:
char[][] board = new char[cols][rows];
ArrayList<String> validwords = new ArrayList<String>();
String[] words;
words = loadStrings("dictionary.dic");`
Here is my code to check which words are possible to form with the current board:
boolean wordOnBoard(String a, int x, int y, char[][] b) {
a = a.toUpperCase();
if (a.length()==0) return true;
if (x < 0 || x >= rows || y < 0 || y >= cols) return false;
char firstCharacter = a.charAt(0);
if (b[x][y] != firstCharacter) return false;
char[][] nextBoard = new char[rows][cols];
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; j++) {
nextBoard[i][j] = b[i][j];
}
}
nextBoard[x][y] = '~';
String nextWord = a.substring(1, a.length());
if (wordOnBoard(nextWord, x-1, y-1, nextBoard)) return true;
if (wordOnBoard(nextWord, x-1, y-0, nextBoard)) return true;
if (wordOnBoard(nextWord, x-1, y+1, nextBoard)) return true;
if (wordOnBoard(nextWord, x-0, y-1, nextBoard)) return true;
if (wordOnBoard(nextWord, x-0, y+1, nextBoard)) return true;
if (wordOnBoard(nextWord, x+1, y-1, nextBoard)) return true;
if (wordOnBoard(nextWord, x+1, y-0, nextBoard)) return true;
if (wordOnBoard(nextWord, x+1, y+1, nextBoard)) return true;
return false;
}
Putting the possible words in an ArrayList "validwords":
for (int r=0; r<words.length; r++) {
for (int x=0; x<rows; x++) {
for (int y=0; y<cols; y++) {
if (words[r].length() >= 3 && wordOnBoard(words[r], x, y, board)==true) {
validwords.add(words[r]);
}
}
}
}
Answers
Dunno how to come up w/ a diff. approach. Anyways, here's my optimization attempt: 3:-O
Pushing the optimization a lil' further: \m/
Apparently, there's no need to corrupt the original board array w/
b[x][y] = '~';
.Therefore, no need to use a cloned array argument for wordOnBoard() all this time! ;;)
err...
do you loop over all word in the dict and check for each whether it's on the board?
This must take ages
I did Boggle some days ago
My approach:
the other way round:
Basically take each cell in the board as a starting point
check the words that can be made from there and check if they are in the dict
(initially: Make a list of PVectorInts (write your own class like 2D PVector but with int) that tells us how the path for for each word is, e.g. 1st possible word is 0,0->1,0->2,1 etc.
1 more optimization. This time substring() is removed! o=>
Putting the dictionary in a trie structure might help...
Dunno how could I let this 1 slip thru'! If a validWord is found, there's no need to keep on the double
x
&y
for
loops! X_XThus, prematurely
continue
to the next word would shortcut a couple redundant search for the same already found word: :-BThanks a lot for the help already! ^:)^
@GoToLoop Your method works much faster! Thanks for that. The only problem is that for some reason your method allows the player to use the same char as many times as he wants, which is not allowed in the game Boggle... :-?
@Chrisir Since this is my first Java/Processing application I'm making, I don't know a lot about PVectors and stuff. Is it OK if I PM you my source code so you could have a look at it? :\">
@PhiLho Loading the dictionary file isn't really the problem, it's fast enough with loadStrings(). So a "trie structure" (not exactly sure what it is) won't be really necessary I guess?
Well, I didn't know what a Boggle was about:
https://en.wikipedia.org/wiki/Boggle
Unfortunately, I guess you'd need a real pro & more complicated algorithm like the 1 found below:
http://pages.pathcom.com/~vadco/deep.html
Hmm, but my slow method didn't allow the same char to be used multiple times and I don't know why. o.O
As mentioned @ version 2.2's change log, I've removed
b[x][y] = '~';
and the clone() reset!Seems like that extra optimization has cut off an important aspect of your algorithm!
Anyways, I've transfered the latest optimizations from v2.4.x to v.2.1.x and made this ultimate v2.5.x. Check it out: :D
P.S.: Take notice that any tiny optimization counts. If "dictionary.dic" got all of its letters in uppercase (or lowercase),
there's no real need for
final String ww = w.toUpperCase();
at all.Same case for when all words in that file got at least 4 letters, comment out
if (w.length() < 3) continue;
!Thanks again for the fast answer! Now it works again, but the speed is as slow as in the original method... I guess I'll have to look for a different method like Chrisir said... :/
Edit: Thanks for the info! I removed all words that had less than 3 letters from the dictionary and commented out that line :)
There are several ways of improving the speed but since you have 200,000 words probably the first improvement is too ignore impossible words. For instance if you can't find the word 'gun' then 'gunman', 'gunner' and 'guns' are all impossible, in fact any word that starts with 'gun' should be skipped. Also before testing the word against the grid it would be better to make sure its length is acceptable, not while your scanning the grid.
Taking these into account I came up with the code below, it assumes that the words are in alphabetical order. Also note that the startsWith() method is case sensitive so beware of that
Excellent idea @quark ! I've incorporated your fast-forward algorithm idea as newest v2.6.x: :-\"
P.S.: In order to work w/ those latest modifications, all words gotta be either upper or lower case this time! :(
And for best efficiency, alphabetic order for contiguous similar words is a must! :D
Thanks @quark for your idea! To get this to work, I need to convert my dictionary file to the correct format (alphabetical, UpperCase). You guys know any free software that does this? Here is my dictionary file for english: wouterproject.heliohost.org/processing/english.dic It may take a while to load the page (over 150k words in it).
Edit: Nvm, found it! textmechanic.com/Sort-Text-Lines.html
My algorithm requires the words to be sorted alphabetically. For efficiency this should be done as a one off.
No need for specialist software the sketch below will create a new file called
sorted_words.dic
from your existing dictionary. Simply use the new file in your sketch instaea.Thanks! Here's the result: :D wouterproject.heliohost.org/processing/sorted_words.dic It's already working way faster than my original method!! Thanks a lot guys :)
New v2.7 using arrayCopy() instead of clone()! =:)
P.S.: There are words w/ less than 3 characters in that file yet! :-SS
Forgot to delete them in that file! Thanks :P wouterproject.heliohost.org/processing/sorted_english.dic
Edit: Works like a charm!
Oh, wait... :-O (v2.7) Using arrayCopy() breaks my Letter-Generating function? o.O
Weird! But pay attention that @ v2.7.x,
cloned[][]
variable is declared as a field at the top, just likeboard[][]
! L-)It isn't declared as a local variable within the
for r
loop any longer! b-(For each word, the original board[][] configuration is copied back to cloned[][], restoring it.
Br careful on not allowing wordOnBoard() to corrupt the original board[][]. Pass cloned[][] as argument instead! [..]
Also, changing clone() to arrayCopy() gives an error when running the app on my phone or emulator:
java.lang.OutOfMemoryError: [memory exhausted]
Oh shoots, you're right! Field
cloned[][]
got the same references asboard[][]
due to clone()'s shallow copy! b-(Sorry, my fault! Replace
char[][] cloned = board.clone();
w/char[][] cloned = new char[ROWS][COLS];
.P.S.: If
for (int i = board.length; i-- != 0; arrayCopy(board[i], cloned[i]));
isn't working on your phone, commenting it out.Then uncomment
for (int i = board.length; i-- != 0; cloned[i] = board[i].clone());
v2.73 is amazing! On my phone it now takes under 1 sec to load the game! :D
Wow! Just tell me, was it necessary to use clone() instead of arrayCopy() there? :P
And guess that's still another way to make it even faster? "Threads"!
Like we can split those 150000 about words as 3 concurrent threads of 50000 for example! 8-}
Changing clone() to arrayCopy() made it a lot faster for some reason :P And with the incorrect declaration that is fixed now, I don't need to comment that line out anymore for android.
The next challenge is to "select" the letters by clicking on them and make it so that the player only can click on a letter that is a neighbor of the previously clicked letter.
Some online inspiration:
http://studio.processingtogether.com/sp/pad/export/ro.9ABG0RKl9D2Bx/latest
http://studio.processingtogether.com/sp/pad/export/ro.9L5jAtga7SpyF/latest
Thanks, that worked, now I just need to put in a "neighbor-checker".
I dunno what that "neighbor-checker" is about. Then some questions gotta be defined! :-/
for
loops used to invoke wordOnBoard()?It should be pretty similar to wordOnBoard(). This new checkNeighborings() should check if I click a specific button from the board that the the previously clicked button is next to this button (like we did in wordOnBoard() with chars) so we can highlight the selected buttons. Also it should check if it is the first clicked button, because then there are no previously clicked buttons... :-? Another thing is when the player clicks this button again, it should remove the letter that is on this button from the input string and also remove the highlight on this button (only if it is the last clicked button).
No, I don't think so, since I'm trying to use this boolean selected[][]. (not sure if this is a good method)
It should be called everytime you click a LetterBox.
I guess, but I'm not really sure about this one... :-S
Lemme see if I've got it:
Are those statements above correct? :-??
No, incorrect :P . It's just a function to check if you are allowed to highlight/select the button you click and then add the letter board[i][j] that is on the button to the input string. Maybe we should make it a boolean isNeighborings() to be used in mousePressed(). If true; add letter to the input string and highlight it, if false; do nothing.
The input string gets validated when I press Enter but I already figured out how to do that :) . The wordOnBoard() is correct.
If a player can only choose an adjacent button from previous selected 1 in the grid,
maybe an ArrayList<PVector> to store that sequence (row, col) would be a better approach?
This way, it's much easier to validate each selection based on the immediate previous choice,
since anything 2 rows and/or cols apart isn't adjacent anymore! :P
Once a sequence choice has been confirmed, iterate over it to acquire the resulting String derived from those coordinate pairs;
then compare it to validWords. *-:)
That ArrayList<PVector> can also be used to decide which buttons to highlight in the board! :D
Yes, that would definitely be a better approach! The only problem is that I'm not very familiar with PVector ArrayLists... :\">
EDIT: I'll try to use a PVector :)
It doesn't necessarily need to be an ArrayList. A regular array works as well.
http://processing.org/reference/ArrayList.html
Same for PVector. But then you're gonna need 2 arrays to replace it. 1 for
row[]
and the other forcol[]
.http://processing.org/reference/PVector.html
In the case above, 2 IntList for row and col is even easier!
http://processing.org/reference/IntList.html
P.S.: I thought that both validWords & the board[][] had uppercase letters only?
So why are u still using toUpperCase() method? :-??
Thanks, I'll look into those. Oh yes, forgot to remove it. It's from before the dictionary files were converted to uppercase...
Here's a thought. I can't prove it'll speed it up any, but....
How much will this help? Well, a Boggle board has 16 places on it, so there are at least ten letters that won't show up. Worst case scenario: The ten least-common letters are the ones that don't show up on the board. According to Wikipedia, those are b, j, k, p, q, v, x, y, z, and their relative frequency will add up to roughly 8%. You'll eliminate less than 8% of possible answers, however, due to some words containing more than one of those 10 letters. So, like 6% or 7% increase in efficiency, at a guess. Not great.
Much better case scenario: The letter e has a frequency of 12.7%. If it doesn't show up on your board, that's 12.7% of words you don't have to check, right there. Bewsh! "Holy schist that's awesome" scenario: I've had boggle boards in which the only vowel was 'y'. The letter 'y' shows up in less than 2% of words, and in some of those words it's used only as a consonant. You know what's easier than a 200,000-word list? A 4,000-word list.
I'll admit I haven't done much math to prove that this will help more than any other suggestions, but at least it runs in linear time! This is as opposed to the "for each word, check if it's on the board" approach, which runs in factorial time, ouch. But if you can use a linear-time sub-algorithm to decrease the input size to a factorial-time algorithm, you should score a net profit... right?
my advice:
don't Generate a list of possible words in the first place.
Since you only want to check if the move of the player is valid do a neighbour check and a check we used this cell already in this word (at every click) and when he submits a word check it with your dict
@Shoruke Great idea, I'll try to write an algorithm based on your thoughts when I find the time. :) Anyways, the current algorithm I'm using (optimized by @GoToLoop & @quark) can check for existing words from a 200k list in under 1 second. On my phone!
@Chrisir Thanks for the advice, but I also need to give the option that the player is able to enter words using the keyboard and check the input when they press Enter, so if I don't generate a list of possible words this would be impossible I think.
EDIT: I also want to know how many words are possible at the beginning of the game so I can work with different difficulty modes (if there are less words, it's harder for the player to find words), so this would also require to check for valid words when the game starts, right?
for the validation I still don't think you need a complete list
you write
there you would need such a list of course
and you also want such a list (but you don't need it) when you want to play against the computer as an opponent
I feel it's a wrong move allowing keyboard input! A player should be constrained by the letters present in the board! [..]
A player should form a validWords by clicking a contiguous sequence of letter clickable buttons, am I right? :-&
Yes, I know... But our teacher said it should have both options. :-SS
I think the keyboard is ok as long as it according to what you see on your game board
BAD
ORX
when you type "board" here on the keyboard, its' absolutely ok (since the cells are neighbours)
NENOP
EVEXE
when the user types "even" we can't know where he starts, so in theory we have to check every E and check whether "even" would be possible from that starting position or not (and find a position where it's possible from. When we find none, we don't submit.)
so the last E is not valid as a start, the other Es are
as long as you just want to check if the word "even" is valid - it's no problem
(HOW you lay the word is not entered (or known) when using the keyboard, it could be 6-7-2-3 or 2-7-8-3*, but that's not important)
*given the numbers in the small board above are
12345
67890
Perhaps you should allow keyboard input by board[][] coordinates rather than letters!
Like C3 (ENTER), D4 (ENTER), D5 (ENTER) and so on! /:)
Letter is row & number is column! *-:)
oh no, far to complicate ;-)
mouse
either click each field
or press and hold on the first cell and drag over all cells of the word and release the mouse button on the last cell
keyboard
either have the letters: e,v,e,n etc.
or use cursor keys and have a selected (highlighted) cell and when hit space you add the letter of the cell selected by cursor to your word - q,e,y,c on your keyboard could represent the diagional moves, cursor vertical and horizontal moves of the selected cell.
"Loading the dictionary file isn't really the problem, it's fast enough with loadStrings(). So a "trie structure" (not exactly sure what it is) won't be really necessary I guess?"
You can look in Wikipedia, among other references, what a trie is. You will see it is not about loading the dictionary file, but rather about allowing to do quick lookups in the dictionary. The problem of prefixes is well addressed by this structure.
I used to own a Boggle game that came on a cd, it would allow you to type words and hit enter.
Sounds like a depth-first search. Remember to store what letters you've already been to, you don't want to allow the word "even" when there's only one 'e' on the board!