We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpOther Libraries › get file structure of a website, proHTML
Page Index Toggle Pages: 1
get file structure of a website, proHTML? (Read 646 times)
get file structure of a website, proHTML?
Nov 1st, 2007, 5:47am
 
hi, I'm trying to make a visualization of a website file system/structure,  but can't seem to get it together. I've looked at everything proHTML can do, read every post about it and still can't figure out a way to do it. I could parse all the paths included in a given page, but even then I could be missing some files that are not linked. Any suggestions?

thanks
Re: get file structure of a website, proHTML?
Reply #1 - Nov 1st, 2007, 7:40am
 
you are on the right track, collect the links (frames, iframes, form-actions, ...) on each site and follow the ones you've not yet checked. continue until there are no unchecked links left.

then try getting the index of each files directories:

www.blablub.com/some/path/to/file.html

then try these too:
www.blablub.com/some/path/to/

www.blablub.com/some/path/

www.blablub.com/some/

do the same for any linked media (images, ..).

another idea is to search google for the domain and check links that pop up there ...

and .. ehem .. an unpolite way of looking for additional pages is to check for a robots.txt at top level and see what's in there.

but, as far as i know there's no way to make sure you really get every file that's publicly available via a borwser ...

F

Re: get file structure of a website, proHTML?
Reply #2 - Nov 3rd, 2007, 7:42pm
 
awesome thanks.

I managed to build a program that reads the root and makes a visualisation in "bar" format with folders and files on the root level. It's not perfect but it's getting there...I was going to post it here but apparently it's too long so I'll link to it once it's done.
Page Index Toggle Pages: 1