Google Indexing Binaries too - For good or for bad

Printable View

July 11th, 2006, 10:40 AM
hardcode121

Google Indexing Binaries too - For good or for bad

Just during my regular routine of going through news across the IT world.. I stumbled upon the following link

http://arstechnica.com/news.ars/post/20060710-7225.html

Quote:

Should Google continue to index binary files, despite the potential drawbacks? The company's position is that the more things on the Internet that are searched, the better things are for everyone, and that people shouldn't worry too much about any possible misuse. Still, the more powerful a tool becomes, the more the potential for abuse increases. This applies not only to Google, but to the Internet in general. As always, skeptical computing is the best defense.

Google bots crawling the web have become capable enough to index the Binary File contents too:

http://homemade-tutorials.blogspot.c...ble-files.html

Are there are many unforeseen dangers of spreading malware and spyware using google bots looking straight into the eye now... Just throwing this topic for discussion.
July 11th, 2006, 12:04 PM
brokencrow

There's going to be pluses and there's going to be minuses with Google indexing binaries.

The downside is some 'genius' will figure out a way to socially engineer the feature into a spyware/malware setup.

The upside is that such indexing casts a light into those shadowy corners of the web that harbor spyware and malware, not to mention warez.
July 11th, 2006, 05:07 PM
HTRegz

Hey Hey,

There are no drawbacks to google providing binary search capabilities.... Human stupidity has always existed and to say that a drawback of the searching is human stupidity (i.e. the ability to be socially engineered) isn't right... You'd have to say that everything has the drawback of human stupidity... It's one of those global constants that you just can't consider... it always exists... I could say that this forums drawback is that I could social engineer someone out of their password... that's human stupidity... not a drawback.... Whoever wrote the initial article is a drawback (they're part of the problem of human stupidity)...

Peace,
HT
July 12th, 2006, 06:31 AM
brokencrow

Quote:

I could say that this forums drawback is that I could social engineer someone out of their password... that's human stupidity...

The question then is, whose stupidity? ;)
July 12th, 2006, 02:10 PM
HTRegz

Quote:

Originally posted here by brokencrow
The question then is, whose stupidity? ;)

Judging by that comment... I'm going to say yours :D
July 12th, 2006, 03:20 PM
hardcode121

Search into the deeper end of the world wide web

These is still a vast world wide web left beyond the reach of today's search engines, which i prefer to call 'Deeper end of the Web'.

The deeper end of the web consists of web pages created dynamically in response to the user requests and huge databases linked to the sites. Though search engine querry can fetch you a static page html or else depending upon the metadata or whatsoever.. still your search querry in not able to fetch any response from the huge data residing in the public databases.. The search engines are yet to get capable of digging this much deeper in to the web yet.

Quote:

Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.

Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.

The deep Web is qualitatively different from the surface Web. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a "one at a time" laborious way to search. BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content.

This is the interesting paper on this topic.

P.S. posting slightly offtopic but interesting info in this post only to keep the spirit of the thread alive.

Peace
July 12th, 2006, 04:28 PM
brokencrow

I wonder to what extent a robots.txt file will keep Google and other search engines out of websites, or parts of them.

I use robots.txt files on some of my sites, particularly the stuff I locally host via a DSL connection. They seem to work.
July 18th, 2006, 09:34 AM
HTRegz

Those of you interested in experimenting in this may wish to check out http://metasploit.com/research/misc/mwsearch/?q=bagle

Peace,
HT