Loading...
Logo
Processing Forum
Hi, I've read up on the coding and also searched specifically for this error message. I've tried to debug but can't find the issue. Does anyone know what could be wrong with coding? FYI: Ive changed out URLs I'm using to keep it general.

This program is very simple. Its parses the first page for the first HREF it finds and then follows that link, opens up new page, does the same search for HREF and continues. The issue I'm having is making sure the HREF URL is not within parentheses but outside of them. but the error message is coming from somewhere else.


Copy code
  1. String[] lines;
  2. String pTag = "<p>";
  3. String hrefTag = "<a href=\"";
  4. int count=0;
  5. int i = 0;
  6. String found;
  7. String rightHREF;
  8. String line;
  9. String[] m1;
  10. String[] m2;
  11. String[] checkPara;
  12. String[] checkPara2;
  13. String para = "\\(";
  14. String para2 = "\\)";
  15. String domain = "http://www.domain.com";
  16. String newHREF = "http://www.domain.com/specific_page";

  17. void setup() {
  18.   search4URL(); // start searching
  19. }

  20. void search4URL() {                            // search url 
  21.   while(count<15) {                             // search this number of urls 
  22.     lines = loadStrings(newHREF);     // Load new URL
  23.     println(newHREF+", i = "+ i + ", found = " + found + ", Array Length = "+ lines.length);                          // Debug, print NEW URL
  24.     //println(lines);
  25.     while(found == null) {                     // do this until you find first URL in body
  26.       m1 = match(lines[i], pTag);              // check each line for <p> tag
  27.       if (m1 != null) {                        // if found a line with <p> then check if it has a url
  28.         //println("----in first tag loop");    // DEBUG
  29.         line = lines[i];                       // extract line to test further
  30.         m2 = match(line, hrefTag);             // check if line has url
  31.         if (m2 != null) {                      // if found a url then test and do extraction
  32.           while(rightHREF == null) {           // Do this until you have extracted correct HREF
  33.           int beginPos = line.indexOf(hrefTag); // Get position of first HREF location
  34.           String textBefore = line.substring(0,beginPos); // Extract text before HREF
  35.           String s = line.substring(beginPos+9);// Extract text after HREF text
  36.           checkPara = match(textBefore, para);  // Check if text before HREF has a "("
  37.           checkPara2 = match(textBefore, para2);  // Check if text before HREF has a ")"
  38.           println("i = "+i+" , checkPara = "+checkPara+" , checkPara2 = "+ checkPara2);
  39.           if (checkPara != null) {              // if there is an opening parentheses before HREF then
  40.             if (checkPara2 != null) {           // and if there is a closing parentheses before HREF then link found is valid
  41.               int endPos = s.indexOf("\"");               // Get position of end of URL
  42.               String href = s.substring(0,endPos);    // Extract URL from text
  43.               newHREF = (domain + href);           // Update newHREF with new URL
  44.               found = "1";                          // Exit while (found) loop
  45.               rightHREF = "1";
  46.             } else {                            // Else then link found is within parentheses.
  47.               line = (s);
  48.             }
  49.           } else {
  50.             int endPos = s.indexOf("\"");               // Get position of end of URL
  51.             String href = s.substring(0,endPos);    // Extract URL from text
  52.             newHREF = (domain + href);           // Update newHREF with new URL
  53.             found = "1";                          // Exit while (found) loop
  54.             rightHREF = "1";
  55.           }
  56.           //println(newHREF);
  57.           m1 = null;                            // Exit m1 loop
  58.           m2 = null;                            // Exit m2 loop
  59.           checkPara = null;
  60.           checkPara2 = null;
  61.         }
  62.         }                                       // End of URL found & Extraction loop  
  63.       }                                         // End of TAG found & find URL loop
  64.       i++;                                      // Lets go to next line in page
  65.     }                                           // End of searching whole page
  66.     i = 0;                                      // reset line counter
  67.     found = null;                               // reset FOUND indicator
  68.     count++;                                    // lets go to next URL
  69.   }
  70. }
ERROR MESSAGE (it stops on line 29 in red above:
Copy code
  1. http://www.domain.com/specific_page/), i = 0, found = null, Array Length = 100
  2. i = 61 , checkPara = null , checkPara2 = null
  3. http://www.domain.com/second_page, i = 0, found = null, Array Length = 110
  4. processing.app.debug.RunnerException: ArrayIndexOutOfBoundsException: 1080
  5. at processing.app.Sketch.placeException(Sketch.java:1543)
  6. at processing.app.debug.Runner.findException(Runner.java:583)
  7. at processing.app.debug.Runner.reportException(Runner.java:558)
  8. at processing.app.debug.Runner.exception(Runner.java:498)
  9. at processing.app.debug.EventThread.exceptionEvent(EventThread.java:367)
  10. at processing.app.debug.EventThread.handleEvent(EventThread.java:255)
  11. at processing.app.debug.EventThread.run(EventThread.java:89)
  12. Exception in thread "Animation Thread" java.lang.ArrayIndexOutOfBoundsException: 1080
  13. at wiki_philosophy.search4URL(wiki_philosophy.java:52)
  14. at wiki_philosophy.setup(wiki_philosophy.java:43)
  15. at processing.core.PApplet.handleDraw(PApplet.java:1583)
  16. at processing.core.PApplet.run(PApplet.java:1503)
  17. at java.lang.Thread.run(Thread.java:680)


Thanks in advance!

Replies(1)

It doesn't stop at the last line, it just keeps going. So to prevent the error, you need to add something so the code only goes as long as the number of lines, that way you don't go "out of bounds". As a failsafe to do this you could change line 28 into this:
Copy code
  1.     while (found == null && i < lines.length) {                     // do this until you find first URL in body && max lines
But more importantly the found doesn't seem to be working correctly, even in a page that has an url it just keeps going (at least the page that I tried). So that's probably another thing to look at.