Regex question--works in the regex tester but not in processing
in
Programming Questions
•
8 months ago
I am trying to pull names off of a website--the names are all in caps so it seemed like it would be easy but I am getting a lot of extra text. I tested a solution that worked great in
http://regexpal.com, but will not work in processing
here is my regex equation
"(?!LOW)([A-Z][A-Z][A-Z]+)"
I want words that are all caps, three letters or longer, and not the word LOW
What I am getting are lots of single letters (probably pulled from the start of sentances) and the word LOW
Here is my code:
import prohtml.*;
import java.util.regex.*;
PrintWriter txtfile;
HtmlList htmlList;
void setup(){
size(100,100);
//enter your url here
htmlList = new HtmlList("http://nuclearweaponarchive.org/Usa/Tests/Nevada.html");
String s_list = "";
for (int i = 0;i<htmlList.pageList.size();i++){
String s = htmlList.pageList.get(i).toString();
Pattern p = Pattern.compile("<");
Matcher m = p.matcher(s);
boolean found = m.find();
if (found == false){
s_list = s_list.concat(s + " ");
}
}
println(s_list);
Pattern q = Pattern.compile("^KT[A-Z]&&[A-Z]&&[A-Z]+");
Matcher n = q.matcher(s_list);
int numMatches = 0;
// load existing text into memory
String lines[] = loadStrings("nuclear_match.txt");
txtfile = createWriter("nuclear_match.txt");
for(int i=0;i<lines.length;i++){
txtfile.println(lines[i]);
}
while(n.find() == true){
String h = n.group();
txtfile.println(h);
numMatches++;
}
txtfile.flush();
txtfile.close();
println("results: " + str(numMatches));
String word_list[] = split(s_list, ' ');
saveStrings("html_input.txt", word_list);
}
1