Processing 1.0 - Processing Discourse - Retracing the steps of splitTokens()

We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.

Index › Programming Questions & Help › Syntax Questions › Retracing the steps of splitTokens()

‹ Previous Topic | Next Topic ›

Pages: 1

Retracing the steps of splitTokens() (Read 2404 times)

Rapatsk1

Retracing the steps of splitTokens()
Jun 3^rd, 2010, 5:19pm

Hi guys,

So I'm really in love with the splitTokens() method. But now I'd like to find a way to save what the method has split on. I'm working on this 'classy' way to decompose text (as promised here) and have trouble saving the punctuation. The main reason to use splitTokens is for its ability to treat consecutive delimiters as one; don't really fancy rewriting the method myself.

Case study: in order to split to sentences, I split on ".!", obviously so expressions like "..." and "!!" are split accordingly. How can I reconstruct this punctuation after splitting

ben_hem

Re: Retracing the steps of splitTokens()
Reply #1 - Jun 3^rd, 2010, 5:47pm

I think your best bet is rewriting the function (sorry :P).

It isn't that bad -- you need the function to run through each character in the string, saving consecutive chunks of punctuation and non-punctuation alike, and labeling them accordingly. (Probably in a String arrayList/Vector.) So you'd just keep appending the current character to the newest String in the arrayList, and if it shifts from punctuation to non-punctuation or vice versa, snip that String off and add a new one. Or, if you want to be able to return just words or just punctuation, you could write your own class that contains a boolean and a string, and make an arrayList of those.

Rapatsk1 Junior Member Offline Posts: 99 www.tiemenrapati.com	Re: Retracing the steps of splitTokens() Reply #2 - Jun 5^th, 2010, 6:24am Hi Ben, Thanks for your reply. I was afraid you'd say that. But indeed, it shouldn't be too difficult. Maybe just a bit annoying. The rest of it is running now, having created the nastiest of for loops I ever saw: http://www.flickr.com/photos/rapatski/4668648029/

Quark

Re: Retracing the steps of splitTokens()
Reply #3 - Jun 5^th, 2010, 7:37am

The following code makes use of Java's StringTokenizer class. It will not only split the text but also remember the deliminators used.
Code:


StringTokenizer toker;
String[] pieces;
String aString;
String delims;

void setup(){
  // String to parse
  aString = "Mary has a, little   lamb.";
  aString += "\nIt's fleece as; white! as snow.";
  // Characters used as separators
  delims = " ,.;!\n";
  
  // Create a tokeniser for this string
  // Last param = true since we want the deliminators as well
  toker  = new StringTokenizer(aString, delims, true);
  
  String[] pieces = new String[toker.countTokens()];
  int index = 0;
  while(toker.hasMoreTokens()){
    pieces[index++] = toker.nextToken();
  }
  
  for(int i = 0; i < pieces.length; i++){
    println(i + "\t>" + pieces[i] + "<");
  }
}

It produces the following output (token 13 is the newline character.

0 >Mary<
1 > <
2 >has<
3 > <
4 >a<
5 >,<
6 > <
7 >little<
8 > <
9 > <
10 > <
11 >lamb<
12 >.<
13 >
<
14 >It's<
15 > <
16 >fleece<
17 > <
18 >as<
19 >;<
20 > <
21 >white<
22 >!<
23 > <
24 >as<
25 > <
26 >snow<
27 >.<

PhiLho

Re: Retracing the steps of splitTokens()
Reply #4 - Jun 5^th, 2010, 9:04am

Quark's code is very close of what is inside Processing's splitToken(), just adding the right option to keep the delimiters.
If you want to collapse consecutive delimiters, you have to do some additional work:
Code:

StringTokenizer toker;
String[] pieces;
String aString;
String delims;

void setup(){
  // String to parse
  aString = "Mary had ^^ a, little   lamb." +
	"\nwhose fleece was; white!? as snow...";
  // Characters used as separators
  delims = "^,.;:!?";
  
  // Create a tokeniser for this string
  // Last param = true since we want the deliminators as well
  // WHITESPACE defined by Processing
  toker  = new StringTokenizer(aString, delims + WHITESPACE, true);
  
  String[] pieces = new String[toker.countTokens()];
  int index = -1;
  boolean bWasDelim = false;
  while (toker.hasMoreTokens()) {
    String part = toker.nextToken();
    if (WHITESPACE.contains(part))
	continue;  // Just skip these
    if (delims.contains(part)) {
	  if (bWasDelim) {
	    pieces[index] += part;
	  } else {
	    pieces[++index] = part;
	    bWasDelim = true;
	  }
    } else {
	pieces[++index] = part;
	bWasDelim = false;
    }
  }
  // Get a truncated result since we dropped parts
  String[] result = Arrays.copyOf(pieces, index);
  
  for (int i = 0; i < result.length; i++) {
    println(i + "\t>" + result[i] + "<");
  }
  exit();
}

Rapatsk1 Junior Member Offline Posts: 99 www.tiemenrapati.com	Re: Retracing the steps of splitTokens() Reply #5 - Jun 6^th, 2010, 6:11am Whoa that's great dudes, thanks! WHITESPACE is also real nice, didn't know about that one. From your code I conclude it also encompasses \n & \r? and TAB?

Rapatsk1

Re: Retracing the steps of splitTokens()
Reply #6 - Jun 6^th, 2010, 6:36am

Hey.. I never visited dev.processing.org, but I guess it can be pretty handy to look into the Processing inner workings! Thanks.
For the sake of future lookups, I'll put that code here too, but will happily use your solutions

Code:

static public String[] splitTokens(String what, String delim) {
    StringTokenizer toker = new StringTokenizer(what, delim);
    String pieces[] = new String[toker.countTokens()];

    int index = 0;
    while (toker.hasMoreTokens()) {
	pieces[index++] = toker.nextToken();
    }
    return pieces;
  }

Pages: 1

‹ Previous Topic | Next Topic ›