Processing 1.0 _ALPHA_ - parser bother

FAQ


	This is the archive Discourse for the Processing (ALPHA) software. Please visit the new Processing forum for current information.

   Processing 1.0 _ALPHA_
   Programming Questions & Help
   Syntax (Moderators: fry, REAS)
   parser bother

« Previous topic | Next topic »

Pages: 1

Author

Topic: parser bother (Read 1024 times)

ryan*
Guest
Email

parser bother
« on: Feb 20^th, 2003, 1:13pm »

Hi I was wondering if either Fry or Reas would explain how the parser commands work and what they do. I'm trying to work towards a simple HTML parser that just scans for certain things like images.

splitInts()
splitFloats()
splitStrings()
join()

thanks

skloopy

Re: parser bother
« Reply #1 on: Feb 21^st, 2003, 2:19am »

BTW, sorry if this is too much trouble.. :^)

REAS

Re: parser bother
« Reply #2 on: Feb 21^st, 2003, 3:40am »

join() is not implemented yet. probably in _52_

all the splits work like this:
splitInts(string to be split, token that is separating)

for example:
String s = "0001+0002";
int[] data = splitInts( s, '+' );
println(data[0]);
println(data[1]);

let me know if you need more...

skloopy

Re: parser bother
« Reply #3 on: Feb 22^nd, 2003, 7:39am »

thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class? basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that? I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing..

benelek

Re: parser bother
« Reply #4 on: Feb 22^nd, 2003, 11:33am »

the java spec for the String class provides several good methods for operating on a string:

http://java.sun.com/products/jdk/1.2/docs/api/java/lang/String.html

this may take longer than Casey's way, but if ur intent on using built-in java stuff...

Code:

String theCode = "bladidada <img src=theAddress>";
int startIndex = theCode.indexOf("<img src=");
int endIndex = theCode.indexOf(">", startIndex);
String theAddress = theCode.substring(startIndex+9, endIndex);
println(theAddress);

-jacob

benelek

Re: parser bother
« Reply #5 on: Feb 22^nd, 2003, 11:38am »

mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript?

fry

Re: parser bother
« Reply #6 on: Feb 22^nd, 2003, 3:33pm »

on Feb 22^nd, 2003, 11:38am, benelek wrote:

mmm, actually this brings me to something that's been nagging me for a while. does anybody know why i can't use " and ' interchangeably in P5, as in javascript

hm, hadn't even thought of implementing it. you might post that to suggestions and see if others are into it as well.

fry

Re: parser bother
« Reply #7 on: Feb 22^nd, 2003, 3:50pm »

on Feb 22^nd, 2003, 7:39am, Ryan wrote:

thanks! Are these the same mothods you use to isolate certain words in the Processing parser, or is there another Java or custom class basically all I need to do is take the body or text and search for <img src=" and then get the string from that until the next " . Do you have any tips on how I can do that I mean I could just write something that scans brut-force and tooks for letter sequences, but you already wrote a parser in processing..

there are a couple parsers at work behind the scenes, the one that gives us all the quirky stuff with the code is based on oro-matcher, which uses 'regular expressions' as a more advanced way to do pattern matching. (it's a hack to use oro-matcher so it's our fault not theirs) we need to move to a 'real' parser which breaks things into a sort of tree based on a grammar, whcih will eventually fix those bugs..

split() and friends are simple specific-use methods that we included because we find them useful for our own work. so for instance, if you had your entire html file as a String, you could solve your original problem with:

Code:

String pieces[] = splitStrings(htmlstring, '<');
for (int i = 0; i < pieces.length; i++) {
  // uses toLowercase to that it doesn't care
  // whether it's img src or IMG SRC or IMG src etc
  if (pieces[i].toLowercase().indexOf("img src=") == 0) {
   // this is an image tag
   // 9 is for the number of characters in: img src="
   String filename = pieces[i].substring(9);
   int quote = filename.indexOf("\"");
   filename = filename.substring(0, quote);

   // now do something with the filename
  }
}

this quickly gets messy when you have to make exceptions for whether or not the page designer put quotes around the filename after src=, or if someone uses a tag like <IMG BORDER=0 SRC=blahblha.gif>. it's not difficult but just gets messy.

this is where a more robust parser comes into play.. regular expressions allow you to do conditional matching (i.e. i can state that quotes are optional) and isn't as brittle. a full parser (not just matching) would do a better job of dealing with those quirky scenarios too, since it's easier to specify those exceptional cases in the parser 'grammar'.

« Last Edit: Feb 22^nd, 2003, 3:54pm by fry »

benelek

Re: parser bother
« Reply #8 on: Feb 23^rd, 2003, 12:27am »

i haven't had any experience with regular expressions (besides the stuff that usually comes out of my mouth, hehe), would you mind explaining what they involve?

skloopy

Re: parser bother
« Reply #9 on: Feb 23^rd, 2003, 1:58am »

Thanks for the help. I'll post the result when I'm done.

It seems like it's impossible right now to import a library (like a parser) into Processing I've tried the command line method. But it seems like for me, the split() command may be all I need.

The code might get a little long. It might be really cool if you could have multiple java files in a Processing project. Maybe you could use a page metaphor?

« Last Edit: Feb 23^rd, 2003, 1:58am by skloopy »

Mike Davis

Re: parser bother
« Reply #10 on: Feb 23^rd, 2003, 2:00am »

http://etext.lib.virginia.edu/helpsheets/regex.html

benelek

Re: parser bother
« Reply #11 on: Feb 23^rd, 2003, 5:39am »

cool, thanks Mike.

Pages: 1


« Previous topic \| Next topic »