|
Author |
Topic: proHTML : dead links (Read 512 times) |
|
nay
|
proHTML : dead links
« on: Feb 16th, 2005, 3:25pm » |
|
hi all i am hoping to use the proHTML library to list dead links, unfortunately if i try to parse a link to find out if it is dead - and it is - it will abort the program any advice on how i can do this? i would assume that Christian Riekoff's piece "Tree" (http://www.texone.org/tree/index.html), which uses the library would have something to stop it crashing when it hits a dead link cheers rene.
|
|
|
|
amoeba
|
Re: proHTML : dead links
« Reply #1 on: Feb 16th, 2005, 6:55pm » |
|
Do you get any errors when it aborts, NullPointerExceptions etc? If ProHTML uses the URL.getConnection method there is no way to set a timeout on the connection, which can be very bad (i.e. eternal wait...) I just finished a web crawler project where I ended up using the Apache Jakarta Commons HttpClient library. My spider (aka the useless Universal Digest Machine) is currently happily surfing and printing out little receipts for every page it visits...
|
marius watz // amoeba http://processing.unlekker.net/
|
|
|
nay
|
Re: proHTML : dead links
« Reply #2 on: Feb 17th, 2005, 12:21am » |
|
hi amoeba - nice piece! it looks like the error is from within the proHTML library to me but i could be wrong (and often am!): prohtml.InvalidUrlException: This is not a parsable URL at prohtml.HtmlTree.<init>(HtmlTree.java:9 at Temprary_6243_3356.setup(Temporary_6243_3356.java:9) it spits out the error and quits. would be great if whatever caused the error would print an error and return a bool value or something instead of quitting. am hoping to get this to work with this libary as I have no java experience but will look into it if i have to...
|
|
|
|
JohnG
|
Re: proHTML : dead links
« Reply #3 on: Feb 17th, 2005, 3:26pm » |
|
Just guessing at the syntax, since I'm not 100% sure of Java excpetions, but if you change the call to: Code: try { <command that causes program to quit> } catch(prohtml.InvalidUrlException e) { println("Bad Url"); } |
| it may work. Like I said, this is just a complete guess, since I've not tried to use exceptions in Java before.
|
|
|
|
amoeba
|
Re: proHTML : dead links
« Reply #4 on: Feb 17th, 2005, 3:55pm » |
|
nay: Glad you liked the piece. John is correct about the exception handling, you could make it even more general by saying the following: Code:try { <command block> } catch(Exception e) { println("Exception: "+e); } |
| Exception handling is usually needed when you don't know the data you're dealing with, just be aware that after an exception has been thrown you can't assume that any of the commands inside the try-catch block have been carried out. The best practice is to discard the result and start whatever you're doing again from the beginning. Or, as they say: "Fail gracefully."
|
« Last Edit: Feb 18th, 2005, 1:36am by amoeba » |
|
marius watz // amoeba http://processing.unlekker.net/
|
|
|
nay
|
Re: proHTML : dead links
« Reply #5 on: Feb 18th, 2005, 7:17pm » |
|
excellent! thanks guys - both snippets work! will read up on exception handling, not a term i was familar with am on my way...
|
|
|
|
|