Regex pattern matching

MFXMFX
edited March 2017 in Questions about Code

Hi, I found some code in the old forum which I'm trying to modify to pull in image URL from a webpage. In the original code it is matching an image tag in the form of

<IMG SRC="image/1207/AR1520_071112friedman900.jpg"

Using

// Optional spaces, the IMG tag and its attribute, capture of the URL, anything after it
Pattern pat = Pattern.compile("\\s*<IMG SRC=\"(image/.*?)\".*");

However I want to match/extract a tag in the form of

<a class="link" href="GOPR0237.JPG">GOPR0237.JPG</a>

(where the 4 digits between "GOPR" and ".JPG" are unknown, ultimately it will be compiled into the url

"http://10.5.5.9:8080/DCIM/100GOPRO/GOPRxxxx.JPG" so the image can be downloaded.

This is the current full code if anyone's interested

import processing.net.*;
import java.util.regex.*;
import java.util.*;
import java.text.*;

Client c;
String URL_BASE = "http://10.5.5.9:8080/DCIM/100GOPRO/";  // Where the Gopro stores it's images

PImage GoProImage;

Pattern pat = Pattern.compile("\\s*href=\"(GOPR.*?)\".*"); //NOT CURRENTLY WORKING!

void setup()
{

  size(4000, 3000); 
  background(50);
  fill(200);

  DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
  String imageName = "GoPro-" + df.format(new Date()) + ".jpg";

  c=new Client(this, "10.5.5.9", 80);
  c.write("GET /bacpac/SH?t=oakh6214&p=%01 HTTP/1.0\r\n");
  c.write("\r\n");

  String url = findImageURL(URL_BASE);
  print(url);
  if (url != null) // Found
  {
    GoProImage = loadCachedImage(imageName, URL_BASE + url);
  }
}

void draw()
{
  image(GoProImage, 0, 0);
}

String findImageURL(String pageURL)
{
  String url = null;
  String[] lines = loadStrings(pageURL);
  for (String line : lines)
  {
    Matcher m = pat.matcher(line);
    if (m.matches())
    {
      url = m.group(1);
      break;
    }
  }

  return url;
}

PImage loadCachedImage(String fileName, String url)
{
  PImage img = loadImage(fileName);
  if (img == null) // Not downloaded yet
  {
    img = loadImage(url);
    if (img != null)
    {
      img.save(fileName); // Cache of the file
    } else
    {
      println("Unable to load the image from " + url);
      exit();
    }
  }
  return img;
}

Answers

  • MFXMFX
    edited March 2017

    FFS! THe original code had

    "Pattern pat = Pattern.compile("\\s*href=\"(GOPR.?)\".*"); //REGEX NOT CURRENTLY WORKING!"

    But for some reason a "\" and "*" got strppped out in pasteing to the forum seems you need to type three slashes for two to appear in the forum ?!?!?

    EDIT, thanks, now sorted it

  • edited March 2017

    @MFX -- have you tried testing your regex and your sample match data using an online interactive regex testing tool such as regex101 or regexbuddy?

    Is it working, or is the problem with the regex pattern itself rather than the Processing code?

  • .*href=\"\(GOPR[0-9]*.JPG\)\".*
    

    if this matches then it'll set the 1st pattern to the filename GOPRxxxx.jpg

    assumes only digits between the GOPR and the .JPG. and all uppercase.

  • MFXMFX
    edited March 2017

    Thanks all I've decided to start from scratch and approach it differently rather than try and use someone elses code, this works as a starting point

    String lines[] = loadStrings("http://10.5.5.9:8080/DCIM/100GOPRO/");
    println("Files found");
    for (int i = 0 ; i < lines.length; i++) 
    
    
      {
        String[] m=match(lines[i], "href=\"(.*?)\">G"); 
        if (m != null)
        {
        println(m[1]);
        }  
    }
    
  • Ignore the "target="_blank" rel="nofollow">http://10.5.5.9:8080/DCIM/100GOPRO/");" Bit, the forum inserted that.

  • ah, ok, that works in bash, not yet in java. give me a minute...

  • oh, you don't need to escape the grouping brackets in java

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    String s = "<a class=\"link\" href=\"GOPR0237.JPG\">GOPR0237.JPG</a>";
    
    Pattern pat = Pattern.compile(".*href=\"(GOPR[0-9]*.JPG)\".*");
    
    Matcher m = pat.matcher(s);
    if (m.matches()) {
      String url = m.group(1);
      println(url);
    }
    
    println(m.matches());
    
Sign In or Register to comment.