August 7th, 2004 01:31 AM
Some background first...
Google as a Hacking Tool by 3rr0r:
Google Hacking Honeypots:
Google hacking and Credit Card Security:
Google: Net Hacker Tool:
Google Aids Hackers:
Google is watching you:
Seems that Google is becoming a problem for some webmasters. I decided to check out what Google knew about the site I took over, so I decided to write this tut while I worked as a reference.
Control the Spiders
Nearly all crawlers work with something called the Robots Exclusion Standard, which allows webmasters to determine which parts of their website are indexed.
To do this, we stick a text file called robots.txt at the top level of our document root folder. Here is an example file:
This code sucks. It allows all crawlers to index whatever they want. Lets write code to deny all crawlers.
Notice the slash, it tells all crawlers to ignore everything past the document root folder.
This code will tell the crawler to ignore documents past the admin and cgi-bin folders in the document root folder. Now lets define which crawlers we like and dont like. These are called records, and hard returns matter for it to work. 1 return between records.
You can also deny a single file
#Denys access to Google's spiders
Note that wildcards only work in the "User-agent" line.
Meta Tag Crawler Denial
You may not have permission to put a robots.txt file in the document root of your webserver. This method is available, though crawlers do not support this method as well. This is simple, place one of these meta tags in your pages:
Permission to index, and follow links:
<meta name="robots" content="index,follow">
Do not index, permission to follow links
<meta name="robots" content="noindex,follow">
Permission to index, do not follow links
<meta name="robots" content="index,nofollow">
Do not index, do not follow links.
<meta name="robots" content="noindex,nofollow">
This method is a lot more work, and is not well supported, but requires no permission to setup.
Dumping info in Google
This is an easy trick, though not practical for large sites. Enter this into the google search engine:
You'll see that it dumps all it knows about your site. If you aren't too popular, you can skim through it to see what it knows.
In order to use this great tool, you need to register for a Google license key. Get it done here:
SiteDigger can be found here-
Install SiteDigger, and enter your license key in the bottom right corner. After that, update your signatures by clicking options, update signatures. Enter your domain where it says, "please enter your domain", and click search.
What SiteDigger does is run automated searches on your domain with signatures, looking for common indexing mistakes left behind by webmasters. Hackers use this, so should you. Anything it finds should be handled accordingly.
In short, learn to protect your public files. Learn to use .htaccess files for apache webservers here-
Comments and criticisms encouraged.