Google Indexing Binaries too - For good or for bad
Just during my regular routine of going through news across the IT world.. I stumbled upon the following link
http://arstechnica.com/news.ars/post/20060710-7225.html
Quote:
Should Google continue to index binary files, despite the potential drawbacks? The company's position is that the more things on the Internet that are searched, the better things are for everyone, and that people shouldn't worry too much about any possible misuse. Still, the more powerful a tool becomes, the more the potential for abuse increases. This applies not only to Google, but to the Internet in general. As always, skeptical computing is the best defense.
Google bots crawling the web have become capable enough to index the Binary File contents too:
http://homemade-tutorials.blogspot.c...ble-files.html
Are there are many unforeseen dangers of spreading malware and spyware using google bots looking straight into the eye now... Just throwing this topic for discussion.
Search into the deeper end of the world wide web
These is still a vast world wide web left beyond the reach of today's search engines, which i prefer to call 'Deeper end of the Web'.
The deeper end of the web consists of web pages created dynamically in response to the user requests and huge databases linked to the sites. Though search engine querry can fetch you a static page html or else depending upon the metadata or whatsoever.. still your search querry in not able to fetch any response from the huge data residing in the public databases.. The search engines are yet to get capable of digging this much deeper in to the web yet.
Quote:
Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.
Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.
The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search. BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content.
This is the interesting paper on this topic.
P.S. posting slightly offtopic but interesting info in this post only to keep the spirit of the thread alive.
Peace