December 19th, 2004, 11:16 PM
The Deep Web
I've been doing some reading up on the deep web (http://www.press.umich.edu/jep/07-01/bergman.html) and found it pretty interesting that what I find though google and other search engines really only return a handful of the results that are actually out there.
Ive tried a few of the "deep web" search engines (http://www.thebighub.com/ and http://aip.completeplanet.com/) but they dont seem to produce the kind of information I would expect, say i searched for a name the seldom find a useful name, im just experiementing here.
Does anyone here often intentional delve into the "deep web"? evidentally its not a signposted area (or at least i dont believe it is from what ive read)
Ive written my own search spider before and I was just woundering how one might go about add fuctionality to it to aid in the searching of deep web resources?
December 21st, 2004, 08:24 PM
i2c - I'm not really familiar to this "deep web" but after skimming the document, it appears that "deep web" is just a catch-phrase to describe websites driven largely by their back-end databases.
For example, if you search Google for some sort of topic on Computer Security Tutorial, chances are that your results won't link to a specific post on AntiOnline. Instead, it would most likely link to AntiOnline's main page. Most web-robots refuse to index links that include variables, so Google would never put www.antionline.com/showthread.php?threadid=xxx into its results. When you consider that your threadid is #264727, or more like the 260,000th thread (~810,000th post) and that there are hundreds more webforums on the Internet with just as many, or even more posts, you can see how indexing that would be a huge strain on not only the person doing the indexing, but the person being indexed.
There are also websites like LiveJournal and Wikipedia and DeviantArt and MapQuest and IMDB that have gigantic databases that change daily. Indexing these would be best described as painful... Also there are private, subscription-only databases that an indexer would be denied access to. All of these make indexing the entire web more difficult, if you could somehow solve the tremendous data-storage requirements first.
Thus, a Deep Web Search(tm) would probably do searches using a websites' own databases. If we do a search for some topic, it would possibly query its own databases to see which websites have databases related to it, and then it could send a query to those sites and see if what is returned is relevant or not. At least that is how I would imagine it would work.
Cheers. Just saw this topic floating around, and decided to contribute something after being scared off by the huge read of your link.
December 21st, 2004, 10:10 PM
Yep I thought it was a catch phrase, my post was partly inspired by a book (it was rubbish im not gonna recommend it) which described intellegence gathering using the so called deep web, so i had a look into it, and that link was by fair the most compreshensive,
it was just interesting and I was interested to see how many other people call it the deep web, whether people intentionally set out to find these links that arent indexed by search engines...