We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpPrograms › String parsing problem...
Page Index Toggle Pages: 1
String parsing problem... (Read 1167 times)
String parsing problem...
Dec 11th, 2006, 12:47pm
 
I was trying to make processing connect to a web server and randomly display all of the images on the page. It seemed fairly easy to do a simple string parsing task...

I got it to connect to my profile on myspace and it gets SOME images, but it doesn't find them all...

I think it's a problem with the String class, because printing it to the debug window shows all of the HTML, and it should be able to find the <img> tags alright.

Any help figuring out or cluing me in on what I'm doing wrong would be greatly appretiated.

Jack Kern

Quote:


import processing.net.*;

Client client;

int imgCount = 0;
String imgArray[];


void setup()
{
 size(600, 600);
 background(0);
 noStroke();
 
 imgArray = new String[500];
 
 
 // Open a TCP socket to the host:
 client = new Client(this, "myspace.com", 80);

 // Print the IP address of the host:
 println(client.ip());

 // Send the HTTP GET request:
 client.write("GET http://www.myspace.com/JacksEvilClone HTTP/1.1\n");
 client.write("HOST: myspace.com\n\n");
 
 
//  http://login.myspace.com/index.cfm?fuseaction=login.process&Mytoken=2EFC7B41-A9E8-4975-B2ECCCFA8D656F6168924880
}

void DrawImage( String img )
{
     imgArray[imgCount] = img;
     imgCount++;
     println( "Image #" + imgCount + ": " + img );
     
     PImage b = loadImage( img );
     try
     {
       image(b,random(400),random(400));
     }
     catch( Exception e )
     {
     }
}

void FindImages( String string )
{
 boolean searching = true;
 int index = 0;
 for( int i=0; i < 100; ++i )
 {  
   int imgBeginLoc = string.indexOf("<img src=", index);
   if( imgBeginLoc == -1 )
   {
     println("img not found");
   }
   else
   {
     int imgEndLoc = string.indexOf("\"", imgBeginLoc + 11);
     if( imgEndLoc == -1 )
     {
       println("img end not found");
     }
     else
     {
       String img = string.substring(imgBeginLoc + 10,imgEndLoc);
       DrawImage(img);
       index = imgEndLoc + 1;
     }
   }
 }
 println("leaving.");
}

void draw()
{

 // Print the results of the GET:
 if (client.available() > 0)
 {
   String page = client.readString();
   FindImages( page );
 }

}


Re: String parsing problem...
Reply #1 - Dec 11th, 2006, 4:14pm
 
I haven't check your code, but there is the proHTML library which have the class HtmlImageFinder. I think it will be easier to use this instead of parsing the code by your self.

http://texone.org/prohtml/htmlimagefinder_class_htmlimagefinder.htm
Re: String parsing problem...
Reply #2 - Dec 11th, 2006, 8:04pm
 
Didn't even realize that existed thanks!
Re: String parsing problem...
Reply #3 - Dec 11th, 2006, 8:26pm
 
The program finds the same number of images on the page using HTMLImageFinder compared to the code I wrote myself. The problem is that there are way more img tags on the page than it's finding. I think it's a problem with the string parsing in java as far as I can tell, because it's skipping over alot of img tags in the page's source.

I'll try using the regular HTML parser and see if it can find more image tags.
Re: String parsing problem...
Reply #4 - Dec 11th, 2006, 8:43pm
 
That code seems to find the same images that I see on the page.

You'll have to remember that the program will see the page the same as someone who ISN'T in your friend on myspace, unless you've got code to do the login, grab and re-use the cookies etc, which you've not included.
Re: String parsing problem...
Reply #5 - Dec 11th, 2006, 10:45pm
 
Wow, do I feel stupid.

Of course firefox saves my session! This had me so baffled I couldn't figure out why I was seeing different results and it seemed it could only be java's string class, which had to be thoroughly tested...

Thanks for the insight!

Jack
Page Index Toggle Pages: 1