FAQ
Cover
This is the archive Discourse for the Processing (ALPHA) software.
Please visit the new Processing forum for current information.

   Processing 1.0 _ALPHA_
   Programming Questions & Help
   Syntax
(Moderators: fry, REAS)
   parser bother
« Previous topic | Next topic »

Pages: 1 
   Author  Topic: parser bother  (Read 1024 times)
ryan*
Guest
Email
parser bother
« on: Feb 20th, 2003, 1:13pm »

Hi I was wondering if either Fry or Reas would explain how the parser commands work and what they do. I'm trying to work towards a simple HTML parser that just scans for certain things like images.
 
splitInts()
splitFloats()
splitStrings()
join()
 
thanks
 
skloopy

WWW
Re: parser bother
« Reply #1 on: Feb 21st, 2003, 2:19am »

BTW, sorry if this is too much trouble.. :^)
 
REAS


WWW
Re: parser bother
« Reply #2 on: Feb 21st, 2003, 3:40am »

join() is not implemented yet. probably in _52_
 
all the splits work like this:
splitInts(string to be split, token that is separating)
 
for example:
String s = "0001+0002";
int[] data = splitInts( s, '+' );
println(data[0]);
println(data[1]);
 
let me know if you need more...
 
 
 
skloopy

WWW
Re: parser bother
« Reply #3 on: Feb 22nd, 2003, 7:39am »

thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class? basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that? I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing..
 
benelek

35160983516098 WWW Email
Re: parser bother
« Reply #4 on: Feb 22nd, 2003, 11:33am »

the java spec for the String class provides several good methods for operating on a string:
 
http://java.sun.com/products/jdk/1.2/docs/api/java/lang/String.html
 
this may take longer than Casey's way, but if ur intent on using built-in java stuff...
 
Code:

String theCode = "bladidada <img src=theAddress>";
int startIndex = theCode.indexOf("<img src=");
int endIndex = theCode.indexOf(">", startIndex);
String theAddress = theCode.substring(startIndex+9, endIndex);
println(theAddress);

 
-jacob
 
benelek

35160983516098 WWW Email
Re: parser bother
« Reply #5 on: Feb 22nd, 2003, 11:38am »

mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript?
 
fry


WWW
Re: parser bother
« Reply #6 on: Feb 22nd, 2003, 3:33pm »

on Feb 22nd, 2003, 11:38am, benelek wrote:
mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript

 
hm, hadn't even thought of implementing it. you might post that to suggestions and see if others are into it as well.
 
fry


WWW
Re: parser bother
« Reply #7 on: Feb 22nd, 2003, 3:50pm »

on Feb 22nd, 2003, 7:39am, Ryan wrote:
thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing..

 
there are a couple parsers at work behind the scenes, the one that gives us all the quirky stuff with the code is based on oro-matcher, which uses 'regular expressions' as a more advanced way to do pattern matching. (it's a hack to use oro-matcher so it's our fault not theirs) we need to move to a 'real' parser which breaks things into a sort of tree based on a grammar, whcih will eventually fix those bugs..
 
split() and friends are simple specific-use methods that we included because we find them useful for our own work. so for instance, if you had your entire html file as a String, you could solve your original problem with:
 
Code:
String pieces[] = splitStrings(htmlstring, '<');
for (int i = 0; i < pieces.length; i++) {
  // uses toLowercase to that it doesn't care
  // whether it's img src or IMG SRC or IMG src etc
  if (pieces[i].toLowercase().indexOf("img src=") == 0) {
    // this is an image tag
    // 9 is for the number of characters in: img src="
    String filename = pieces[i].substring(9);
    int quote = filename.indexOf("\"");
    filename = filename.substring(0, quote);
 
    // now do something with the filename
  }
}

 
this quickly gets messy when you have to make exceptions for whether or not the page designer put quotes around the filename after src=, or if someone uses a tag like <IMG BORDER=0 SRC=blahblha.gif>. it's not difficult but just gets messy.
 
this is where a more robust parser comes into play.. regular expressions allow you to do conditional matching (i.e. i can state that quotes are optional) and isn't as brittle. a full parser (not just matching) would do a better job of dealing with those quirky scenarios too, since it's easier to specify those exceptional cases in the parser 'grammar'.
« Last Edit: Feb 22nd, 2003, 3:54pm by fry »  
benelek

35160983516098 WWW Email
Re: parser bother
« Reply #8 on: Feb 23rd, 2003, 12:27am »

i haven't had any experience with regular expressions (besides the stuff that usually comes out of my mouth, hehe), would you mind explaining what they involve?
 
skloopy

WWW
Re: parser bother
« Reply #9 on: Feb 23rd, 2003, 1:58am »

Thanks for the help. I'll post the result when I'm done.
 
It seems like it's impossible right now to import a library (like a parser) into Processing I've tried the command line method. But it seems like for me, the split() command may be all I need.
 
The code might get a little long. It might be really cool if you could have multiple java files in a Processing project. Maybe you could use a page metaphor?
« Last Edit: Feb 23rd, 2003, 1:58am by skloopy »  
Mike Davis

WWW
Re: parser bother
« Reply #10 on: Feb 23rd, 2003, 2:00am »

http://etext.lib.virginia.edu/helpsheets/regex.html
 
benelek

35160983516098 WWW Email
Re: parser bother
« Reply #11 on: Feb 23rd, 2003, 5:39am »

cool, thanks Mike.
 
Pages: 1 

« Previous topic | Next topic »