December 4th, 2003, 07:10 PM
Ok just a quick note for those not aware of what a robots.txt file is.....
Baically robots.txt is a text file a webmaster places in the root directory of his site which instructs some search engines as to what areas of the site he is ok for them to index.
Q. Do I have to have a robots.txt file to get ranked on a search engine??
No you do not - if no robots.txt file is present the search engine or spider will just assume that it is ok for it to index your whole site
Q. Will a robots.txt file help my site get looked at by more search engines??
A. No - in order to view the file they must already be at your site
Q. So what do I need it for then??
A. Well it allows you to designate certain areas which you do not wish to be indexed by the search engine - perhaps you have a test folder where you simply try out new things that you dont want the public to be aware of etc
Q. Ok so how do I make one then?
A. Easy just open a txt editor like notepad and type the following
you can add as many dir to this as you wish
Q. Ok have written my file but how do i know it is working?
A. simply upload it into the root dir of your site and that should be it if you wish to check the syntax - and make sure everything is correct you can visit here
or for more info on robots.txt try here
December 4th, 2003, 08:31 PM
December 4th, 2003, 09:28 PM
Nice. I'll be sure to do that. Thanks Val!
December 4th, 2003, 11:48 PM
w00t. Good info. I will use that later on when I start my site.
Real security doesn't come with an installer.
December 4th, 2003, 11:57 PM
Er... Be careful with this..... let's think about it for a minute.....
You have a set of subfolders on your website that you don't want accessed as a result of a simple web search.... So you pop them in the old robots.txt and sit back fat, dumb and happy. Well, Mr. H4x0|2 wants to know if you have anything to "hide" all he does is request robots.txt and bingo - there are those closely held secrets for him to see...... It's a two edged sword...
The Moral: Don't place stuff on a web site that you can't have _anyone_ see and still sleep at night......
Don\'t SYN us.... We\'ll SYN you.....
\"A nation that draws too broad a difference between its scholars and its warriors will have its thinking done by cowards, and its fighting done by fools.\" - Thucydides
December 4th, 2003, 11:59 PM
Hey, that was a great little tute mate. Thanks for posting. I've been doing a lot of web stuff for years, so I knew what it was, but my girlfriend has had a page of her site (about 200,000 hits per day) where she didn't have a robot command and some of the members found out some secrets as to what she was planning to do with the site update-wise. Again, thanks mate.
December 7th, 2003, 08:13 AM
hey val does this help on geocities,angelfire sites where the tt is not present originally?
If not tell me something to improve such free sites
December 7th, 2003, 10:55 AM
this should work on any site that is spidered it will not bring more search engines to your site but once there it instructs them as to what to index
December 7th, 2003, 05:10 PM
Nice tutorial Val.
BTW, even if you're not bothered about giving instructions to search engines, you might want to include a blank robots.txt file anyway. I found that every search engine visiting my site tried to request robots.txt, which left a load of 404 errors in my logs (because the file didn't exist) and made me think I had a broken link somewhere. Same goes for a favicon.ico file when people try to bookmark your site.
December 7th, 2003, 07:27 PM
interesting thing, i did a search on google for robots.txt and found that the whitehouse.gov has a robots.txt file listing disallowing some folders from being spidered....
Sex is like \"Social Security\". You get a little each month, but it\'s not enough to live on.