We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › Retracing the steps of splitTokens()
Page Index Toggle Pages: 1
Retracing the steps of splitTokens() (Read 2404 times)
Retracing the steps of splitTokens()
Jun 3rd, 2010, 5:19pm
 
Hi guys,

So I'm really in love with the splitTokens() method. But now I'd like to find a way to save what the method has split on. I'm working on this 'classy' way to decompose text (as promised here) and have trouble saving the punctuation. The main reason to use splitTokens is for its ability to treat consecutive delimiters as one; don't really fancy rewriting the method myself.

Case study: in order to split to sentences, I split on ".!", obviously so expressions like "..." and "!!" are split accordingly. How can I reconstruct this punctuation after splitting
Re: Retracing the steps of splitTokens()
Reply #1 - Jun 3rd, 2010, 5:47pm
 
I think your best bet is rewriting the function (sorry :P).

It isn't that bad -- you need the function to run through each character in the string, saving consecutive chunks of punctuation and non-punctuation alike, and labeling them accordingly.  (Probably in a String arrayList/Vector.)  So you'd just keep appending the current character to the newest String in the arrayList, and if it shifts from punctuation to non-punctuation or vice versa, snip that String off and add a new one.  Or, if you want to be able to return just words or just punctuation, you could write your own class that contains a boolean and a string, and make an arrayList of those.
Re: Retracing the steps of splitTokens()
Reply #2 - Jun 5th, 2010, 6:24am
 
Hi Ben,

Thanks for your reply. I was afraid you'd say that. Wink But indeed, it shouldn't be too difficult. Maybe just a bit annoying.

The rest of it is running now, having created the nastiest of for loops I ever saw: http://www.flickr.com/photos/rapatski/4668648029/
Re: Retracing the steps of splitTokens()
Reply #3 - Jun 5th, 2010, 7:37am
 
The following code makes use of Java's StringTokenizer class. It will not only split the text but also remember the deliminators used.
Code:

StringTokenizer toker;
String[] pieces;
String aString;
String delims;

void setup(){
 // String to parse
 aString = "Mary has a, little   lamb.";
 aString += "\nIt's fleece as; white! as snow.";
 // Characters used as separators
 delims = " ,.;!\n";
 
 // Create a tokeniser for this string
 // Last param = true since we want the deliminators as well
 toker  = new StringTokenizer(aString, delims, true);
 
 String[] pieces = new String[toker.countTokens()];
 int index = 0;
 while(toker.hasMoreTokens()){
   pieces[index++] = toker.nextToken();
 }
 
 for(int i = 0; i < pieces.length; i++){
   println(i + "\t>" + pieces[i] + "<");
 }
}


It produces the following output (token 13 is the newline character.


0      >Mary<
1      > <
2      >has<
3      > <
4      >a<
5      >,<
6      > <
7      >little<
8      > <
9      > <
10      > <
11      >lamb<
12      >.<
13      >
<
14      >It's<
15      > <
16      >fleece<
17      > <
18      >as<
19      >;<
20      > <
21      >white<
22      >!<
23      > <
24      >as<
25      > <
26      >snow<
27      >.<

Re: Retracing the steps of splitTokens()
Reply #4 - Jun 5th, 2010, 9:04am
 
Quark's code is very close of what is inside Processing's splitToken(), just adding the right option to keep the delimiters.
If you want to collapse consecutive delimiters, you have to do some additional work:
Code:
StringTokenizer toker;
String[] pieces;
String aString;
String delims;

void setup(){
// String to parse
aString = "Mary had ^^ a, little lamb." +
"\nwhose fleece was; white!? as snow...";
// Characters used as separators
delims = "^,.;:!?";

// Create a tokeniser for this string
// Last param = true since we want the deliminators as well
// WHITESPACE defined by Processing
toker = new StringTokenizer(aString, delims + WHITESPACE, true);

String[] pieces = new String[toker.countTokens()];
int index = -1;
boolean bWasDelim = false;
while (toker.hasMoreTokens()) {
String part = toker.nextToken();
if (WHITESPACE.contains(part))
continue; // Just skip these
if (delims.contains(part)) {
if (bWasDelim) {
pieces[index] += part;
} else {
pieces[++index] = part;
bWasDelim = true;
}
} else {
pieces[++index] = part;
bWasDelim = false;
}
}
// Get a truncated result since we dropped parts
String[] result = Arrays.copyOf(pieces, index);

for (int i = 0; i < result.length; i++) {
println(i + "\t>" + result[i] + "<");
}
exit();
}
Re: Retracing the steps of splitTokens()
Reply #5 - Jun 6th, 2010, 6:11am
 
Whoa that's great dudes, thanks!

WHITESPACE is also real nice, didn't know about that one. From your code I conclude it also encompasses \n & \r? and TAB?
Re: Retracing the steps of splitTokens()
Reply #6 - Jun 6th, 2010, 6:36am
 
Hey.. I never visited dev.processing.org, but I guess it can be pretty handy to look into the Processing inner workings! Thanks.
For the sake of future lookups, I'll put that code here too, but will happily use your solutions
Smiley

Code:
static public String[] splitTokens(String what, String delim) {
   StringTokenizer toker = new StringTokenizer(what, delim);
   String pieces[] = new String[toker.countTokens()];

   int index = 0;
   while (toker.hasMoreTokens()) {
pieces[index++] = toker.nextToken();
   }
   return pieces;
 }


Page Index Toggle Pages: 1