We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpPrograms › parsing google search results
Page Index Toggle Pages: 1
parsing google search results (Read 3249 times)
parsing google search results
Jan 20th, 2010, 4:17am
 
Hi everybody,

I’m new to Processing, and programming overall. I already made a first attempt on drawing a circle. You can watch it on openprocessing, id 7047 (I’m not allowed to post links yet.). Pretty boring, I know. Wink
Today I just wanted to give my new ability kind of a sense. My program shall parse the google search results page and return the number of results for a search string. But if I try to load the url using loadStrings() I get following error message: Quote:
java.io.IOException: Server returned HTTP response code: 403 for URL: XXXgoogle search urlXXX
.
What am I doing wrong? Thanks in advance!

Watch below for further information.
Re: parsing google search results
Reply #1 - Jan 20th, 2010, 4:41am
 
First, I advise to play a bit more with the pure graphics... Smiley Getting familiar with loops, tests, string manipulations, arrays, etc. (Unless I underestimate your capacities.)
It looks like you are trying to run before being able to walk... Smiley Internet resources seems easy to grab and play with, but despite Processing making it even simpler, there are lot of hidden pitfalls.

I know you are not yet able to post URLs, but actually you can do so by omitting the http:// part. So, what URL are you trying to access
HTTP 403 error is meaning you are trying to access an Internet resource but the server doesn't allow you to get it. Perhaps it want a password. Or often Web services require you to request a developer key to use their API. And even so, they often restrict the number of requests per time unit.
Re: parsing google search results
Reply #2 - Jan 20th, 2010, 5:01am
 
Okay, I made some silly comments to boost my post count.  Undecided

Here’s the code:
Code:

int initialisieren;

void draw()
{
 if(initialisieren == 0)
 {
   String[] googlesuche = loadStrings("http://www.google.de/search?q=test");
  }
 initialisieren++;
}


The output:
Code:

java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.google.de/search?q=test


The link to my first attempt:
http://www.openprocessing.org/visuals/?visualID=7047
Wink
Re: parsing google search results
Reply #3 - Jan 20th, 2010, 5:07am
 
PhiLho  wrote on Jan 20th, 2010, 4:41am:
It looks like you are trying to run before being able to walk... Smiley

In Germany there is a phrase that says, you’re growing with your challenge.
I’m sorry for that ugly translation.

PhiLho  wrote on Jan 20th, 2010, 4:41am:
Or often Web services require you to request a developer key to use their API. And even so, they often restrict the number of requests per time unit.

I already heard about those things, it’s still Greek to me. I tried to find an answer on google but either I had the wrong question or my abilities in understanding english are lesser than thought, or both. Wink
Re: parsing google search results
Reply #4 - Jan 20th, 2010, 6:02am
 
Try it with a URL that includes an html file (e.g. 'index.html') and you'll find that it works, but you may not get what you expect.

The problem is that loadStrings works with the contents of a text file, so when you use the address of a webpage you get all the underlying HTML code for that page, not just the stuff displayed in your browser...  In the case of the URL you use I suspect it's treating the 'search?q=test' as a file name and obviously not finding a file by that name.

Even if you figure out the right filename to pass I suspect you'll still be disappointed as you'll just get the source code not the returned results.  I have to go to a meeting now (fun!) so I'll leave it to someone else to offer an alternative; though as PhilHo suggested, you do seem to be 'jumping in at the deep end'.  Will you sink or swim?
Re: parsing google search results
Reply #5 - Jan 20th, 2010, 8:21am
 
kaypel wrote on Jan 20th, 2010, 5:07am:
you’re growing with your challenge.

Can hardly argue with that, just ensure you don't bite more than you can chew (festival of common sense quotes). Smiley

Let's try with some simple code, as blindfish suggested:
Code:
void setup()
{
 String[] googlesuche = loadStrings("http://www.google.de/");
 println(googlesuche[0]);
}

OK, it works. Note, since you point to openprocessing: such code won't work in an applet, unless you learn how to sign it...

Now, I will look why the request part isn't handled.

[EDIT] Found it. I vaguely suspected this, it have been confirmed by Newbie - How do you request a web page from within Java thread: you have to make Google believe you are a real Web browser as it avoids being bugged by stupid bots.
Code:
String QUERY = "http://www.google.de/search?q=Processing";

void setup()
{
String[] results = null;
try
{
URL url= new URL(QUERY);
URLConnection connection = url.openConnection();
// Google rejects pure API requests, so we change the header of the request
// to make it believe it is requested by a real browser... :)
connection.setRequestProperty("User-Agent",
"I am a real browser like Mozilla or MSIE" );
results = loadStrings(connection.getInputStream());
}
catch (Exception e) // MalformedURL, IO
{
e.printStackTrace();
}

if (results != null)
{
println(results[2]);
}
}
Re: parsing google search results
Reply #6 - Jan 21st, 2010, 5:30pm
 
Thank you very much. But, I still got problems with google. Smiley If you, or anyone else here, got too much time, feel free to have a look at the code (available below). The lines concerning this are commented.

In the meantime I experimented with Bing and got some results. You can try it here: http://www.openprocessing.org/visuals/?visualID=7080.

I’m looking forward to add collision detection, smarter animation, better interface … There’s a lot to learn.
Re: parsing google search results
Reply #7 - Jan 22nd, 2010, 2:06am
 
Hi,

there does exist somthing like 'HTTPClient', that does all of the work:
http://processing.org/learning/libraries/httpclient.html
so, when wanting to access to a web page(like google), you mostly need to say who you are, otherwise they send you an "page moved" or something else.

Code:
import processing.net.*;

Client c;
String data;

void setup() {
size(200, 200);
background(50);
fill(200);
c = new Client(this, "www.google.de", 80); // Connect to server on port 80
c.write("GET / HTTP/1.1\n"); // Use the HTTP "GET" command to ask for a Web page
c.write("Host: processing.org\n\n"); // Be polite and say who we are
//i did put processing.org there, but you should put your site or your IP.
}

void draw() {
if (c.available() > 0) { // If there's incoming data from the client...
data = c.readString(); // ...then grab it and print it
println(data);
}
}


but as people mentioned before, what you get is the source code of the page. you still need to do what a browser would do in order to get somthing nice out of it.

cheers
Re: parsing google search results
Reply #8 - Jan 25th, 2010, 9:20am
 
I haven’t tried your way timm, but I will. Thank you, too.

Most of the time my program works fine, except the following part:
Code:
String davor = "<span class=\"sb_count\" id=\"count\">1-10 von ";
String danach = " Ergebnissen</span>";
int start = zeichenkette.indexOf(davor);
start += davor.length();
println(start);
int ende = zeichenkette.indexOf(danach,start);
println(ende);
String ergebnis = zeichenkette.substring(start, ende);

The result of these lines differs from day to day. 'start' does always find an index but 'ende' is sometimes set to '-1' although the searched string is definitely present.
Any suggestions

The whole code is available here: http://www.openprocessing.org/visuals/?visualID=7080
Re: parsing google search results
Reply #9 - Jan 27th, 2010, 2:39am
 
mhm, i think what you must do is, to first of all bring all the code you get from google into one string. i once had the problem that you get back the code in multiple "waves" connected by the "fef" command.

if you eg grab the code with sth like "while (c.available() > 0)", you get more than one string, as the loop is entered more than once. this again means, that sth like this would be in your html-code (may be you can check that):
" Ergebnis
fef
sen</span>"

i solved it like this:

Code:
  while (c.available() > 0) {
   String current_data = c.readString(); //grab data
   
//here would be the umlaut transcoding(see code below)
   
data = expand(data, count); //create new space in the array
   
if(count > 0)
data[count-1] = current_data;//fill new space with current input
   count++; //counter to see how many times the loop was entered
 }


Besides i did have the problem with 'umlaute', which you might have as well. Following transcoding worked for me (you can put it where it's marked in the first code):

Code:
    try{
byte[] e=current_data.getBytes();
String v=new String(e,"utf-8");
byte[] f=v.getBytes("iso-8859-2");
String w=new String(f);
current_data = w;
   }
   catch (Exception e){
    println("error");
   }


well this at least worked for me...
Page Index Toggle Pages: 1