Loading...
Logo
Processing Forum
Hello fellow coders,

I want to parse an html to create some custom statistics. Parsing is not the problem - accessing the html directly via loadStrings() is.

Problem: I need to login before I can view the html I want to parse. I figured out, that I may need to make some sort of a post-request in order to send my login-data to the service. How i this feasible with processing?

Thank you for helping out!

Replies(5)

hey der_muk,

I created this class to handle just such issues:
Copy code
  1. class HTTPSconnect {
  2.   HttpHost targetHost;
  3.   DefaultHttpClient httpclient;

  4.   AuthScope authScope;
  5.   UsernamePasswordCredentials upc;

  6.   AuthCache authCache;
  7.   BasicScheme basicAuth;

  8.   BasicHttpContext localcontext;
  9.   HttpGet httpget;

  10.   HTTPSconnect () {
  11.     targetHost = new HttpHost("<<Put your base url here eg: google.com>>", 80, "http");
  12.     httpclient = new DefaultHttpClient();

  13.     authScope = new AuthScope(targetHost.getHostName(), targetHost.getPort(), AuthScope.ANY_REALM, "basic");
  14.     upc = new UsernamePasswordCredentials("<<put your username here>>", "<<put your password here>>");
  15.     httpclient.getCredentialsProvider().setCredentials(authScope, upc);

  16.     authCache = new BasicAuthCache();
  17.     basicAuth = new BasicScheme();
  18.     authCache.put(targetHost, basicAuth);

  19.     localcontext = new BasicHttpContext();
  20.     localcontext.setAttribute(ClientContext.AUTH_CACHE, authCache);
  21.   } 

  22.   JSONObject getStoriesJson (String url) {
  23.     JSONObject jo = null;
  24.     try {
  25.       httpget = new HttpGet(url);
  26.       org.apache.http.HttpResponse response = httpclient.execute(targetHost, httpget, localcontext);

  27.       jo = new JSONObject(EntityUtils.toString(response.getEntity()));
  28.     } 
  29.     catch(Exception e) {
  30.       println("this is some error : "+e.getStackTrace());
  31.     }
  32.     return jo;
  33.   }

  34.   PImage getImage(String url) {
  35.     PImage img = null;

  36.     try {
  37.       httpget = new HttpGet(url);
  38.       org.apache.http.HttpResponse response = httpclient.execute(targetHost, httpget, localcontext);

  39.       byte[] bytes = EntityUtils.toByteArray(response.getEntity());

  40.       ByteArrayInputStream bis = new ByteArrayInputStream(bytes);

  41.       BufferedImage jimg = ImageIO.read(bis);

  42.       img=new PImage(jimg.getWidth(), jimg.getHeight(), PConstants.ARGB);
  43.       jimg.getRGB(0, 0, img.width, img.height, img.pixels, 0, img.width);
  44.       img.updatePixels();
  45.     } 
  46.     catch(Exception e) {
  47.       println("this is some error : "+e.getStackTrace());
  48.     }
  49.     return img;
  50.   }
  51. }


So you'll want to put in your base url, without the http:// or https:// and add your username and password. Then create an HTTTPSconnect object. Then you can call getStoriesJson (you could change this method name, I was getting json files) with the rest of the url to the file you are trying to load. For example, just use everything after the .com/.edu/.whatever. the method will return a JSONObject, but again you may want to change that to just give a String.

As a bonus, it also works with images behind the password wall. these get pretty crazy because you end up getting a byte array back rather than an actual image. so this will parse the byte array and return the PImage. 

you'll have to add some imports to the file as well, I have all these, but some may not be used (I haven't cleaned them up):
Copy code
  1. import org.apache.http.*;
  2. import org.apache.http.client.*;
  3. import org.apache.http.auth.*;
  4. import org.apache.http.protocol.*;
  5. import org.apache.http.util.*;
  6. import org.apache.http.impl.client.*;
  7. import org.apache.http.impl.auth.*;
  8. import org.apache.http.client.protocol.*;
  9. import org.apache.http.client.methods.*;
hope this helps!
ak
Also, I'm not sure what your html parsing process is, but I found jsoup to be crazy useful in getting data out of html.

-ak

*DANG*

That was a fast reply! Many thanks for helping, akiersky!
I tried your code, but the code gives me pain - by prompting this message (from your try / catch - thingy):

this is some error : [Ljava.lang.StackTraceElement;@1ee2433b

I am also not sure, if i used the correct .jar-files from the apache library. It took me some fiddling around until i managed to get the library from HERE. I put the .jar-files into the "code"-folder. Processing is now able to load the libraries. I read about some replacements at apache (http-client vs. http-components). May this cause the error?

I am using the lib/.jar-files from: HttpClient 4.3 (GA)
And Processing 2.0.3

Thanks a bunch!
jsoup is amazing. Remember to set up your user-agent (othewise you may get wrong version of the site) and increase timeout, as default timeout seems to be too short when working with servers under load.
the error may be from trying to convert the string to a json file. Is the content you are loading json? if not, it may be better to just keep it a string. 

As for the libraries, they are for java, so they do take some re-arranging to get them to work in processing. Generally though, if it compiles, then it should work. the try catch error would be from actual loading, so it sounds like you have the lib's set up correctly. 

try removing this line
jo = new JSONObject(EntityUtils.toString(response.getEntity()));

and see if you still get the catch error. It will throw an error about not returning a JSONObject then. to fix that, just change all the JSONObject stuff to String and that should take care of it.

hope this helps!
ak