Spidering the web for information
aueddonline
Member Posts: 611 ■■□□□□□□□□
in Off-Topic
I have a requirement to find restaurants that serve a paticular dish. I was thinking about making this a little programming exercise but am not completely comfortable where to start or what's legal etc.
I suppose utilmatly what I would like to do is spider through the internet collecting webpages which looked relivant and then proccess them further somehow.
I have some programming experience with Java and I have a book called 'Perl & LWP' on my bookshelf which I know covers writing web spiders.
Has anyone done anything simular in the past? My other thought was google hacking to make my manual searches yeild better results.
I suppose utilmatly what I would like to do is spider through the internet collecting webpages which looked relivant and then proccess them further somehow.
I have some programming experience with Java and I have a book called 'Perl & LWP' on my bookshelf which I know covers writing web spiders.
Has anyone done anything simular in the past? My other thought was google hacking to make my manual searches yeild better results.
What's another word for Thesaurus?
Comments
-
paul78 Member Posts: 3,016 ■■■■■■■■■■Have fun with the project. There's a cool little company that made a nice business at it too. you may have heard of it - Google
-
wes allen Member Posts: 540 ■■■■■□□□□□You should look at Python - there are several really good modules for doing this stuff. Or, get an API key from google and plug into them directly.
-
aueddonline Member Posts: 611 ■■□□□□□□□□You should look at Python - there are several really good modules for doing this stuff. Or, get an API key from google and plug into them directly.
I hadn't come accross this API key before, sounds quite cool. I also found this Heritrix here https://webarchive.jira.com/wiki/display/Heritrix/Heritrix;jsessionid=8ECB7A148E458F323231D03868F8C6E4 and got it crawling yesterday. I need to spend a bit more time with it to see if it'll do what I want.What's another word for Thesaurus?