Parallel threads with web requests slower then one thread??

Hy,

I am working on a sketch to download pictures from different websites. So far, that works. As some websites are kind slow I tried to speed the process up by using several threads, doing the same job on different sites.

Now the strange thing is, that if I let the program run by just one thread, it is much faster then 10 threads doing the same job parallel.

Is there any limitation of web access, threads, … for processing? Or do I understand something completely wrong?

Thanks for any help.

Answers

  • edited March 2014 Answer ✓

    Rule of thumb: Don't use more threads than your CPU has cores. In case your CPU supports hyper threading you can double the amount of threads.

    Threads do have some overhead, but even if you are creating far too many of them, they should still outperform a single thread doing the same work.

    Without having your code I can only assume what's going on, but I think the bottleneck is most probably your internet connection. Just take your favourite Download Manager and start 10 simultaneous downloads - in most cases downloading 10 files at once will take longer than downloading one after the other. Downloading just 2 or 3 files at once may perform better.

    Try to spawn fewer threads to find the optimal amount of parallel downloads.

  • Even something as simple as a variable can be a bottleneck if many threads try to access it at the same time! :-S
    When possible, locally cache any shared variable for each thread. Better yet, have individual objects for each thread! :-bd

  • Hi, thanks a lot for the answer. I know about the problem with the CPU cores but it seams not to be that problem. As the program is mostly waiting for the data from the web, the bottleneck sims to be the downloading process. My internet connection is quite fast (1.2MB/S) but the sketch reaches just about 24km/s. I thought, as the slower part is the server from which I try to download the files, it could be a solution to download files from several servers at the same time. Bit this does not work. Here my results loading the same files with one thread, two - three - four similar threads:

    1 Thread: 50 044.0ms 2 Threads: 73 058.0ms 4 Threads: 172 140.0ms

    I think to check the code is too much to expect so sorry that I post it nonetheless.

    Get_img_per_country[] getIMG;
    String[] countries;
    int loopcounter=0;
    long timer;
    
    void setup() {
      size(10, 10, P2D);  //Full HD = 1929 * 1080 //
      frameRate(10);
      background(0);
      noLoop();
      File cntrs = new File(dataPath("UrlFolder")); // Ordner mit Länderlisten öffnen
      countries = cntrs.list();
      getIMG = new Get_img_per_country[countries.length];
    }
    
    
    
    
    
    //-----------------------------------------------------------------------------
    void draw() { 
    
      timer = millis();
      for (int i =0; i<countries.length; i++) {
      //for (int i =0; i<1; i++) {
    
        getIMG[loopcounter]=new Get_img_per_country(this, countries[loopcounter]);
        getIMG[loopcounter].start();
    
        while ( getIMG[loopcounter].running ()) {
          delay(1000);
        }
        getIMG[loopcounter].quit();
        loopcounter ++;
      }
      println("Prozedur abgeschlossen");
      println("benötigte Zeit: "+ str(millis()-timer) +"ms");
    }
    
    
    
    ----CLASS Get_img_per_country -----
    public class Get_img_per_country extends Thread { 
      //Find Images on websites
    
      findimage[] FindImage;
    
      PApplet myParentInstance;
      //String[] Data;
      String[] urlList;
      String country;
      int count = 0;
      public boolean running = false;
      int[] jobdone;          // 
      int threads = 1;        // Ammount of simular processed web requests
    
    
      //-----INIT-----------------------------------------------
      Get_img_per_country(PApplet me, String ctry) {
        myParentInstance = me;
        country = ctry;
        urlList = loadStrings("UrlFolder/"+country);
        FindImage = new findimage[threads];
        jobdone = new int[threads];
        count = 0;
      }
    
    
    
      //-----RUN-----------------------------------------------
    
      void run() {
        System.out.println("thread "+country +" start downloading ... ");
    
        while (running && urlList.length > 0) {
    
          if (urlList.length < threads) {
            threads = urlList.length;
            FindImage = new findimage[threads];
            jobdone = new int[threads];
          }
    
          for (int i=0; i < threads; i++) {
            FindImage[i] = new findimage(myParentInstance, urlList[urlList.length-1], split(country, "_")[1]); //Download the images of one url
            FindImage[i].start();
            urlList = shorten(urlList);
          }
    
    
          try {
            sleep((long)(1000));
          }
          catch (Exception e) {
          }
    
          System.out.println("!! threads running, waiting for jobdone !!");
    
          while (running && min (jobdone) < 1) {  //warten bis alle Websites fertig abgearbeitet wurden
            for (int i=0; i < threads; i++) {
              jobdone[i]= FindImage[i].jobdone();
              try {
                sleep((long)(100));
              }
              catch (Exception e) {
              }
            }
          }
    
          for (int i=0; i < threads; i++) {
            FindImage[i].quit();
          }
        }
        System.out.println("thread "+country +" is done");
        running = false;  // Setting running to false ends the loop in run()
      }
    
    
      //--------------------------------------------------------
    
    
      public boolean running() {
        return running;
      }
    
    
      void start() {
        running = true;
        count = 0;
        super.start();
      }
    
      void quit() {
        System.out.println("Quitting."); 
        running = false;  // Setting running to false ends the loop in run()
        interrupt();
      }
    }
    
    
    ------CLASS findimage---------------------
    
    public class findimage extends Thread {  
      //Find Images on websites
    
      PApplet myParentInstance;
    
    
      public boolean finding = false;
      String url;
      String [] SiteHTML = new String [0];
      String Countriesname;
      public int jobdone=0;
    
    
    
    
      //-----INIT-----------------------------------------------
      findimage(PApplet me, String JobData, String ctrname) {
        myParentInstance = me;
        Countriesname = split(ctrname, ".")[0]; //Dateiendung wegnehmen
        url=JobData;
        //println(url);
        jobdone = 0;
        // try if adress works
        try {
          SiteHTML = loadStrings (url);  //--load HTML Code
        }
        catch(Exception e) {  
          System.out.println("! Access to "+url +" denied !");
        }
      }
    
    
    
      //-----Run-----------------------------------------------
      void run() {
        finding = true;
        if (SiteHTML != null) {
          System.out.println("- Got Access to "+url);
          for (int i=0; i< SiteHTML.length; i++) {
            //--------
            if ( SiteHTML[i].indexOf("<img") > -1   &&   SiteHTML[i].indexOf("jpg") > -1   &&   SiteHTML[i].indexOf("src=\"") > -1 ) {
    
              String temp1 [] = split(SiteHTML[i], "<img");
              for (int k=0; k< temp1.length; k++) {
                if ( temp1[k].indexOf("jpg") > -1   &&   temp1[k].indexOf("src=\"") > -1) {     // just look for jpg images
                  String temp2 = split(temp1[k], "src=\"")[1]; // delete code bevore the image's adress
                  temp2 = split(temp2, "\"")[0];               // delete code after the image's adress
    
                  if (temp2.indexOf("http://") < 0 ) temp2 = url + "/" + temp2;   //if the image adress is a path on the server
    
                  if ( temp2.indexOf("jpg") > -1   &&   temp2.indexOf("http://") == 0 ) { //just to be shure to have a real adress
    
                    PImage image = requestImage(temp2);
    
    
                    try {
                      sleep((long)(1000));
                    }
                    catch (Exception e) {
                    }
    
    
                    while (image.width == 0) {  //wait till image is downloaded
    
                      try {
                        sleep((long)(200));
                      }
                      catch (Exception e) {
                      }
                    }
    
    
                    if ( (image.width != -1   &&   image.width * image.height > 6000)   &&   (float(image.width) / float(image.height) < 3.8)   &&   (float(image.height) / float(image.width) < 3) ) { 
    
                      //Try to save
                      String uurl = split(url, "/")[2]; // ohne http:...
                      try {
                        image.save("Data/Images/"+Countriesname+ "/_"+year()+ "_"+month()+ "_"+day()+ "_"+hour()+ "_"+uurl+"_"+i);
                      }
                      catch(Exception e) { 
                        // e.printStackTrace();
                      }
                    }
                  }
                }
              }
            }
          }
        }
        jobdone = 1;
        finding = false;
      }
    
      //--------------------------------------------------------
    
    
      public int jobdone() {
        return jobdone;
      }
    
    
      void start() {
        finding = true;
        super.start();
      }
    
      void quit() {
        System.out.println("Quitting."); 
        finding = false;  // Setting running to false ends the loop in run()
        jobdone=1;
        interrupt();
      }
    }
    
  • edited March 2014

    @GoToLoop: It's pretty safe/fast to let multiple threads access the same variables. Synchronized classes, basically anything that needs synchronization and/or is order dependend is a problem though. My little multi threaded particle system for reference: http://forum.processing.org/two/discussion/2531/how-to-improve-a-multi-threaded-particle-system

    @baltensperger: I'll test it, but no promises. ;)

    Okay, can't run your code - got an NPE right from the start, as always without any indication what actually throws the exception. Just love the PDE.

    Looking at your code, you seem to run a thread, wait until it has finished and then start the next one. This way you won't get any performance gain. You have to start all threads at once and wait for all of them to finish.

  • edited March 2014

    I know that it's safe/fast! I was just exposing the ideal conditions. Any shared access slows threads down! :-&

    Just to prove it, the extra thread of this program locks down isActive boolean field in a while loop. /:)
    Uncomment its delay(0); statement to open a short time breach in order for the "Animation" thread
    be capable of setting isActive = false;! [..]

    P.S.: Added some print(isActive + "\t"); commands and it seems like it's the tight-knit while loop doing nothing
    which doesn't realize that isActive is already false! X_X

    // forum.processing.org/two/discussion/2531/
    // how-to-improve-a-multi-threaded-particle-system
    
    boolean isActive = true;
    
    void setup() {
      size(400, 300, JAVA2D);
      frameRate(5);
    
      thread("bottleneck");
    }
    
    void draw() {
      background((color) random(#000000));
    }
    
    void mousePressed() {
      isActive = false;
      print(isActive + "\t");
    }
    
    void keyPressed() {
      isActive = false;
      print(isActive + "\t");
    }
    
    void bottleneck() {
      while (isActive) {
        //delay(0);
      }
    
      exit();
    }
    
  • edited March 2014

    I guess you meant something like the following: It's reacting instantly in both cases.

    Now I'm getting what your point was. You should never ever do something like while(boolean) { } at all. Busy waits will slow down your whole program and effectively use a cached copy of the global boolean you think they are using. Using the keyword volatile fixes this behavior. Of course, having another thread accessing the same variable will slow down your current thread, but this should still outperform any attempt to give each step a copy of this variable and synchronize both.

    Just replacing line 14 with if(isActive) background((color)random(#000000)); proves that isActive is actually changed and the busy wait is using it's own copy of the variable.

  • edited March 2014

    ... and the busy wait is using it's own copy of the variable.

    I guess you meant an internally hyper-optimized cache value of the variable! :P

  • edited March 2014

    Exactly (if a theoretically 1 bit long value is still optimizable). The JVM (run time optimization) again, sometimes it helps and sometimes... :D

    Offtopic: Does the strikethrough markup/button work for you?

  • Thanks for the answers. It seams to work now somehow, even when parallel download is not too much faster then single threads.

    findimage[] FindImage;
    
    int threads = 2;        // Ammount of simular processed web requests
    boolean running;
    int[] jobdone;          // 
    long timer;
    String country = "Chile";
    String[] urlList = {
      "http://www.lun.com", 
      "http://www.lasegunda.com", 
      "http://www.diariolaprensa.cl", 
      "http://www.latercera.com", 
      "http://www.lahora.cl", 
      "http://www.lacuarta.com", 
      "http://www.theclinic.cl", 
      "http://www.lanacion.cl", 
      "http://www.publimetro.cl", 
      "http://www.elciudadano.cl", 
      "http://www.elclarin.cl/web", 
      "http://www.elmostrador.cl", 
      "http://www.elnaveghable.cl", 
      "http://www.cambio21.cl", 
      "http://www.pichilemunews.cl", 
      "http://www.ilovechile.cl",
    };*/
    
    
    
    void setup() {
      size(10, 10, P2D);  //Full HD = 1929 * 1080 //
      frameRate(10);
      background(0);
      noLoop();
      running = true;
    }
    
    
    void draw() {
      timer = millis();
      while (running && urlList.length > 0) {
    
        if (urlList.length < threads) threads = urlList.length;
        FindImage = new findimage[threads];
        jobdone = new int[threads];
    
        for (int i=0; i < threads; i++) {  //run defined ammount of threads searching the web.
          FindImage[i] = new findimage( urlList[urlList.length-1], country); //Download the images of one url
          jobdone[i]=0;
          urlList = shorten(urlList);
        }
    
        for (int i=0; i < threads; i++) FindImage[i].start();
    
        System.out.println("!! "+threads+" threads running, waiting for jobdone !!");
        while (running && min (jobdone) < 1) {  //wait till all threads finished their job
          for (int i=0; i < threads; i++) jobdone[i]= FindImage[i].complete();
          delay(10);
        }
    
        for (int i=0; i < threads; i++)    FindImage[i].quit();
      }
    
      running = false;  // Setting running to false ends the loop in run()
      println(threads+ "Threads: Benötigte Zeit: "+ str(millis()-timer) +"ms");
      super.stop();
    }
    

    Class FindImage

    public class findimage extends Thread {  
    
      String url;
      String [] SiteHTML = new String [0];
      String Countriesname;
      public int jobdone=0;
    
    
      //-----INIT-----------------------------------------------
      findimage(String website, String ctrname) {
        url=website;
        Countriesname = ctrname;
    
        try {        // try if adress works
          SiteHTML = loadStrings (url);  //--load HTML Code
        }
        catch(Exception e) {  
          System.out.println("! Access to "+url +" denied !");
        }
      }
    
    
      //-----Run-----------------------------------------------
      void run() {
        if (SiteHTML != null) {
          System.out.println("- Got Access to "+url);
    
          for (int i=0; i< SiteHTML.length; i++) {
            if ( SiteHTML[i].indexOf("<img") > -1   &&   SiteHTML[i].indexOf("jpg") > -1   &&   SiteHTML[i].indexOf("src=\"") > -1 ) {
    
              String temp1 [] = split(SiteHTML[i], "<img");
              for (int k=0; k< temp1.length; k++) {
                if ( temp1[k].indexOf("jpg") > -1   &&   temp1[k].indexOf("src=\"") > -1) {     // just look for jpg images
                  String temp2 = split(temp1[k], "src=\"")[1]; // delete code bevore the image's adress
                  temp2 = split(temp2, "\"")[0];               // delete code after the image's adress
    
                  if (temp2.indexOf("http://") < 0 ) { //if the image adress is a path on the server add web adress
                    url = "http://" + split(url, "/")[2];
                    temp2 = url + "/" + temp2;
                  }
    
                  if ( (temp2.indexOf("jpg") > -1  || temp2.indexOf("jpeg") > -1)   &&   temp2.indexOf("http://") == 0 ) { //just to be shure to have a real adress
                    PImage image = requestImage(temp2);
                    delay(1000);
                    while (image.width == 0) delay(10);
    
    
                    //Safe image to disc if it is not a banner...
                    if ( (image.width != -1   &&   image.width * image.height > 6000)   &&   (float(image.width) / float(image.height) < 3.8)   &&   (float(image.height) / float(image.width) < 3) ) { 
                      String shortenurl = split(url, "//")[1]; // without http:...
    
                      try {
                        image.save("Data/Images/"+Countriesname+ "/_"+year()+ "_"+month()+ "_"+day()+ "_"+hour()+ "_"+ minute()+ "_"+ shortenurl+ "_"+ i);
                      }
                      catch(Exception e) { 
                        e.printStackTrace();
                      }
                    }
                  }
                }
              }
            }
          }
        }
        jobdone = 1;
      }
    
      //--------------------------------------------------------
    
      public int complete() {
        return jobdone;
      }
    
      void start() {
        jobdone=0;
        super.start();
      }
    
      void quit() {
        jobdone=1;
        interrupt();
      }
    }
    
  • edited March 2014

    @Poersch : Offtopic: Does the strikethrough markup/button work for you?

    NEVER! Neither underline was included as @fjen said that he would include! :(

    http://forum.processing.org/two/discussion/comment/673#Comment_673

  • ... just use <s>strokethrough</s> and <u>underline</u> for the moment.

  • Oh! That's what I've been doing since the 1st day! But it's really tiresome! I-)

  • edited March 2014

    @GoToLoop & @fjen: Thanks for the info! I can live with HTML markup.

    @baltensperger: Posting your whole code makes it easier for us to help you. Also, using Java's ExecuterService is much cleaner and may even be faster than "normal" threading - at least in this situation. Have a look at my afore mentioned particle system: http://forum.processing.org/two/discussion/2531/how-to-improve-a-multi-threaded-particle-system

Sign In or Register to comment.