|
-
April 27th, 2006, 12:14 PM
#5
A good example of how to write a robots.txt file is similar to the whitehouse does theirs.
There is a lot that they don't want cached. I'll leave the conspiracy theorists to wonder why...
http://www.whitehouse.gov/robots.txt
Google also has a pretty extensive one.
http://www.google.com/robots.txt
AO doesn't block much at all... looks like they block only for performance reasons?
Block bots that may use too many resources?
http://www.antionline.com/robots.txt
So, if you want more exposure, let them crawl. If not, then block them.
Oh, some "offline" website browsers (such as httrack) also obey the robots.txt file. So, if you don't want someone copying your whole website, you can block their "default" settings. Most of them can be changed at will to bypass any robots.txt files.
On the security side of things... be careful what info you put in your robots.txt file. If you have directirues that are "hidden" and there are no links to on your website, there is no reason to put that path in your robots.txt file as it will now become public knowledge.
A lot can be told about the layout of one's site by the robots.txt file. Which directories they have their scripts in, images, etc. etc. Also, if you don't want a directory "cached"... an attacker will think "why not" and may investigate those directories first looking for "private" goods.
Quitmzilla is a firefox extension that gives you stats on how long you have quit smoking, how much money you\'ve saved, how much you haven\'t smoked and recent milestones. Very helpful for people who quit smoking and used to smoke at their computers... Helps out with the urges.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|