Search strings for words in between a tag?

edited October 2015 in Using Processing

How can I use Processing to search strings (Text from a webpage) and pull out the words in-between a tag, for example everything between <p> and </p>

Answers

  • Answer ✓

    What about regular expression?

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    
    String input = "<p>some</p><a>example</a><p>string</p>";
    Pattern pattern = Pattern.compile("<p>(.*?)</p>");
    Matcher matcher = pattern.matcher(input);
    while (matcher.find())
    {   
        System.out.println(matcher.group(1));
    }
    
  • Yeah this is it, I am using a regular expression on Max MSP but I cannot yet understand how to achieve finishing that patch so I'm trying to do it on Processing instead. I want to extract the regexp of stuff inside a paragraph tag and write it to a text file. This looks like it could work I will try it out. Thank you

  • Ok here is my very noob code sorry I am just not getting how to loadStrings properly, this doesn't work, how could I like break down the lines[] so it can be used in this regular expression method ?

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    
    String lines[] = loadStrings("http://processing.org/about/index.html");
    Pattern pattern = Pattern.compile("<p>(.*?)</p>");
    
    for (int i = 0; i < lines.length; i++) { 
    Matcher matcher = pattern.matcher(lines[i]);
    while (matcher.find())
    {   
        System.out.println(matcher.group(1));
    }
    }
    
  • Answer ✓

    you could combine all the lines into one string like this:

    String completeText ="";
    for (int i = 0; i < lines.length; i++) { 
      completeText += lines[i];
    }
    
  • Answer ✓

    completeText = join(lines," ");

    Will combine all the lines and separate them with a space

  • Excellant. Thank you all, this works

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    
    String lines[] = loadStrings("web page url");
    String completeText = "";
    
    for (int i=0; i < lines.length; i++) {
      completeText += lines[i];
    }
    
    Pattern pattern = Pattern.compile("<p>(.*?)</p>");
    Matcher matcher = pattern.matcher(completeText);
    
    while (matcher.find())
    {   
        System.out.println(matcher.group(1));
    }
    
Sign In or Register to comment.