|  | 
 
  
    | 
      
        |  Author | Topic: parser bother  (Read 1024 times) |  |  
  
    | 
      
        | 
          
            | ryan* Guest
 
  
 | 
              
                | parser bother «  on: Feb 20th, 2003, 1:13pm »
 |  |  Hi I was wondering if either Fry or Reas would explain how the parser commands work and what they do. I'm trying to work towards a simple HTML parser that just scans for certain things like images.
 
 splitInts()
 splitFloats()
 splitStrings()
 join()
 
 thanks
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | skloopy 
 
   
 | 
              
                | Re: parser bother « Reply #1 on: Feb 21st, 2003, 2:19am »
 |  |  BTW, sorry if this is too much trouble.. :^)
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | REAS 
 
 
   
 | 
              
                | Re: parser bother « Reply #2 on: Feb 21st, 2003, 3:40am »
 |  |  join() is not implemented yet. probably in _52_
 
 all the splits work like this:
 splitInts(string to be split, token that is separating)
 
 for example:
 String s = "0001+0002";
 int[] data = splitInts( s, '+' );
 println(data[0]);
 println(data[1]);
 
 let me know if you need more...
 
 
 
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | skloopy 
 
   
 | 
              
                | Re: parser bother « Reply #3 on: Feb 22nd, 2003, 7:39am »
 |  |  thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class? basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that? I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing..
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | benelek 
 
       
 | 
              
                | Re: parser bother « Reply #4 on: Feb 22nd, 2003, 11:33am »
 |  |  the java spec for the String class provides several good methods for operating on a string:
 
 http://java.sun.com/products/jdk/1.2/docs/api/java/lang/String.html
 
 this may take longer than Casey's way, but if ur intent on using built-in java stuff...
 
 Code:
 | | String theCode = "bladidada <img src=theAddress>";
 int startIndex = theCode.indexOf("<img src=");
 int endIndex = theCode.indexOf(">", startIndex);
 String theAddress = theCode.substring(startIndex+9, endIndex);
 println(theAddress);
 
 | 
 | 
 
 -jacob
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | benelek 
 
       
 | 
              
                | Re: parser bother « Reply #5 on: Feb 22nd, 2003, 11:38am »
 |  |  mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript?
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | fry 
 
 
   
 | 
              
                | Re: parser bother « Reply #6 on: Feb 22nd, 2003, 3:33pm »
 |  |  on Feb 22nd, 2003, 11:38am, benelek  wrote:
 | | mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript | 
 | 
 
 hm, hadn't even thought of implementing it. you might post that to suggestions and see if others are into it as well.
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | fry 
 
 
   
 | 
              
                | Re: parser bother « Reply #7 on: Feb 22nd, 2003, 3:50pm »
 |  |  on Feb 22nd, 2003, 7:39am, Ryan  wrote:
 | | thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing.. | 
 | 
 
 there are a couple parsers at work behind the scenes, the one that gives us all the quirky stuff with the code is based on oro-matcher, which uses 'regular expressions' as a more advanced way to do pattern matching. (it's a hack to use oro-matcher so it's our fault not theirs) we need to move to a 'real' parser which breaks things into a sort of tree based on a grammar, whcih will eventually fix those bugs..
 
 split() and friends are simple specific-use methods that we included because we find them useful for our own work. so for instance, if you had your entire html file as a String, you could solve your original problem with:
 
 Code:
 | | String pieces[] = splitStrings(htmlstring, '<'); for (int i = 0; i < pieces.length; i++) {
 // uses toLowercase to that it doesn't care
 // whether it's img src or IMG SRC or IMG src etc
 if (pieces[i].toLowercase().indexOf("img src=") == 0) {
 // this is an image tag
 // 9 is for the number of characters in: img src="
 String filename = pieces[i].substring(9);
 int quote = filename.indexOf("\"");
 filename = filename.substring(0, quote);
 
 // now do something with the filename
 }
 }
 | 
 | 
 
 this quickly gets messy when you have to make exceptions for whether or not the page designer put quotes around the filename after src=, or if someone uses a tag like <IMG BORDER=0 SRC=blahblha.gif>. it's not difficult but just gets messy.
 
 this is where a more robust parser comes into play.. regular expressions allow you to do conditional matching (i.e. i can state that quotes are optional) and isn't as brittle. a full parser (not just matching) would do a better job of dealing with those quirky scenarios too, since it's easier to specify those exceptional cases in the parser 'grammar'.
 |  
            | 
              
                | « Last Edit: Feb 22nd, 2003, 3:54pm by fry » |  |  |  |  |  
  
    | 
      
        | 
          
            | benelek 
 
       
 | 
              
                | Re: parser bother « Reply #8 on: Feb 23rd, 2003, 12:27am »
 |  |  i haven't had any experience with regular expressions (besides the stuff that usually comes out of my mouth, hehe), would you mind explaining what they involve?
 |  
            |  |  |  |  
  
    | 
      
        | 
          
            | skloopy 
 
   
 | 
              
                | Re: parser bother « Reply #9 on: Feb 23rd, 2003, 1:58am »
 |  |  Thanks for the help. I'll post the result when I'm done.
 
 It seems like it's impossible right now to import a library (like a parser) into Processing I've tried the command line method. But it seems like for me, the split() command may be all I need.
 
 The code might get a little long. It might be really cool if you could have multiple java files in a Processing project. Maybe you could use a page metaphor?
 |  
            | 
              
                | « Last Edit: Feb 23rd, 2003, 1:58am by skloopy » |  |  |  |  |  
  
    | 
      
        | 
          
            | benelek 
 
       
 | 
              
                | Re: parser bother « Reply #11 on: Feb 23rd, 2003, 5:39am »
 |  |  cool, thanks Mike.
 |  
            |  |  |  |  
 |