We are about to switch to a new forum software. Until then we have removed the registration on this forum.
For a statistical research I've made a program that counts and stores what and how often words are used. Now it reads form a txt file but I would like to extend the statistics to the internet. There are tuorials on how to read the text from a web page, but I just need the written text, not the code part; Is there a distinction that I can use? The program should read every file in a domain; As an example, I should say the link is "https://www.processing.org/" and it should read the texts on the presentations, the tutorials ecc, but not the html part.
It started as a project on my favourite singer, but I soon discovered I'm wayyyy to lazy to copy and paste his 100+ songs to a .txt file.