text processing

edited March 2017 in How To...

Thanks

Tagged:

Answers

  • What code do you have so far? ou need to check the following terms in the reference:

    ArrayList, split, splitToken, setup, draw,dictionary just for starters.

    https://www.processing.org/reference

    Kf

  • post your attempts.

  • 1)

    String[] ret = {};
    
              for ( int idx = 0; idx < words.length; ++idx ) {
                if (words[idx].length() >= 6) {
                  for (int i = 0; i < -1; i++) {
                    words[idx].charAt(i);
                    //dont know how to check for double letters
                    ret = append( ret, words[idx]);
                  }
                }
              }
    
              return ret;
            }
    

    2)I feel like this doesn't work at all

      String[] ret = {}; 
    
      for ( int idx = 0; idx < words.length; ++idx ) {
        if (words[idx].length() >= 6) {
          String [] newwords = splitTokens(words[idx]); 
          sort(newwords); 
          String newnewwords = join(newwords, ""); 
          if (newnewwords != words[idx]) {
            ret = append( ret, words[idx]);
          }
        }
      }
    
      return ret;
    } 
    

    3)

      String[] ret = {}; 
    
      for ( int idx = 0; idx < words.length; ++idx ) {
        if (words[idx].length() >= 14) {
          int value = words[idx].length();
          for (int i = 0; i < value; i++) {
            if (getCount(words[idx], words[idx].charAt(i)) <= 2) {
              ret = append( ret, words[idx]);
            }
          }
        }
      }
      return ret;
    }
    
    int getCount( String word, char letter ) {
      int repeated = 0;
      for (int i = 0; i < word.length(); i++) { 
        if (word.charAt(i) == letter) {
          repeated += 1;
        }
      }
      return repeated;
    }
          return repeated;
        }
    
  • I feel like this doesn't work at all

    Comments like this are very very useful. You could add more details explaining a bit further.

    For this challenge, I will encourage you to use ArrayList instead of using arrays as they are meant for this task. Personally, I stick to arrays if they have a fixed size. In your case you need a container that its size changes dynamically. ArrayList is your best (better?) option.

    Can you provide a sample of the content of your array words?

    Also, pleas check in the reference the StringList(alternative option to ArrayList) and the String keyword plus associated methods (charAt() for example).

    Kf

  • edited March 2017

    obvious homework

    can you post your entire sketches so we can run it?

    what about #4?

    Look at hashMap also.....

    https://www.processing.org/reference/

  • (use a hashset if you only need to store the fact you've seen an item)

  • Question 1:

    This is the logic:

    1. Store a word to be analyzed in aw which stands for "a word"
    2. Compare adjacent characters starting from char position 1 (not zero) so you compare current to previous character
    3. If adjacent chars are the same:
      ** Increase dblCtr as a double letter was found
      ** Set step to zero. Step keeps track the distance between double letter sets. If steps reaches two, reset the dblCtr to zero.
      ** len keeps track of how many repeated letters are in a set of repeated contiguous letters. If more than 2, reset dblCtr to zero.
    4. If dlbCtr==3 and step==1, we are done. Continue to next word from the array of word provided.

    Notice this algorithm is very specific to the following assumptions:
    1. Only and only two letters in repeated set
    2. At least three contiguous repeated sets

    Is there a library that does this kind of operations already for you? No doubt, but there are for a very specific tasks. Sometimes one is just better off writing the code from scratch. I am afraid you might not follow the logic I used here. If you really want to do this exercise by yourself, you need to forget about the coding part and work in the concept by yourself. You need to design your algorithm. Start by generating a set of words that involves all possible cases. i did that below and I attached the valid or invalid token to indicate so and to use them to test the algorithm.

    Then you need to do the hard part. Take a word one by one, using pen and paper, come up with a set of rules to use in your words. When you define your set of rules, design the algorithm: what to take into account, what to look for, what to avoid, what to discard, etc....

    Then the last part is to write the code. I hope this helps.

    Note: Is this the only algorithm? Def not! Creative coding anyone?

    Kf

    String[] words = {"333444555_invalid","world", "44444455_invalid", "ozone", "NNYYCC_valid", "ocean", "lliightt_invalid", "12334455987_valid", "18004466123_valid", "123rroomeoo_invalid","1122333_invalid"};
    StringList ret= new StringList();
    
    
    for ( int idx = 0; idx < words.length; idx++ ) {
    
      String aw=words[idx];
      int dblCtr=0;            //keps track of consecutive dbl letters  
      boolean done=false;      //If three double letter contiguos found
      int step=0;              //Separation between double letter groups
      int len=0;               //keeps track how many letter in set of repeated letters
    
      for (int i = 1; i < aw.length() && !done; i++) {  //skip first
        char prev=aw.charAt(i-1);
        char curr=aw.charAt(i);
    
        if (curr==prev) {
          dblCtr++;
          step=0;
          len++;
        } else{
          step++;
          len=0;
        }
    
        if (step==2){
          dblCtr=0;
          len=0;
        }
    
        if(len==2){
          dblCtr=0;
          len=0;
        }
    
    
        if (dblCtr==3 && step==1) {   
          ret.append(aw);
          done=true;
        }
      }
    }
    
    println("Found \"three double letters in a row\" in:");
    for (String ss : ret)
      println(ss);
    
  • Another algorithm is checking for the difference between contiguous letters and generate a binary string were 0 means contiguous letters are the same and 1 if they are different.

    This boils down to pattern recognition. Can you figure out the pattern? After you figure the pattern, you can use the following function:

    https://processing.org/reference/match_.html <====huge hint btw!

    However, you have to watch for some patterns that are not allowed based on your requirements so for you to get the final right answer.

    Kf

    String[] words = {"333444555_invalid", "world", "44444455_invalid", "ozone", "NNYYCC_valid", "ocean", "lliightt_invalid", "12334455987_valid", "18004466123_valid", "123rroomeoo_invalid", "1122333_invalid"};
    StringList ret= new StringList();
    StringList dw=new StringList();
    
    for ( int idx = 0; idx < words.length; idx++ ) {
    
      String aw=words[idx];
    
      dw.clear();
      for (int i = 1; i < aw.length(); i++) {  //skip first
        char prev=aw.charAt(i-1);
        char curr=aw.charAt(i);
        int diffChr=curr-prev;
    
        dw.append(diffChr==0?"0":"1");
      }
    
      print(aw + ":\n");
      for (String ss : dw)
        print(ss);
      println();
    }
    
  • Don't post solutions- it's homework

    Write it for you and then don't post it

  • Be aware that the letter case is important

    'F' < 'f' 
    'F' < 'c' 
    

    so

    "Fast" < "call" 
    
This discussion has been closed.