-
Robots.txt
Ok just a quick note for those not aware of what a robots.txt file is.....
Baically robots.txt is a text file a webmaster places in the root directory of his site which instructs some search engines as to what areas of the site he is ok for them to index.
Q. Do I have to have a robots.txt file to get ranked on a search engine??
No you do not - if no robots.txt file is present the search engine or spider will just assume that it is ok for it to index your whole site
Q. Will a robots.txt file help my site get looked at by more search engines??
A. No - in order to view the file they must already be at your site
Q. So what do I need it for then??
A. Well it allows you to designate certain areas which you do not wish to be indexed by the search engine - perhaps you have a test folder where you simply try out new things that you dont want the public to be aware of etc
Q. Ok so how do I make one then?
A. Easy just open a txt editor like notepad and type the following
User-agent: *
Disallow: /test/
you can add as many dir to this as you wish
Q. Ok have written my file but how do i know it is working?
A. simply upload it into the root dir of your site and that should be it if you wish to check the syntax - and make sure everything is correct you can visit here
or for more info on robots.txt try here
thanks
v_Ln
-
-
Nice. I'll be sure to do that. Thanks Val!
-
w00t. Good info. I will use that later on when I start my site.
-
Er... Be careful with this..... let's think about it for a minute.....
You have a set of subfolders on your website that you don't want accessed as a result of a simple web search.... So you pop them in the old robots.txt and sit back fat, dumb and happy. Well, Mr. H4x0|2 wants to know if you have anything to "hide" all he does is request robots.txt and bingo - there are those closely held secrets for him to see...... It's a two edged sword...
The Moral: Don't place stuff on a web site that you can't have _anyone_ see and still sleep at night......
-
Hey, that was a great little tute mate. Thanks for posting. I've been doing a lot of web stuff for years, so I knew what it was, but my girlfriend has had a page of her site (about 200,000 hits per day) where she didn't have a robot command and some of the members found out some secrets as to what she was planning to do with the site update-wise. Again, thanks mate. :)
-
hey val does this help on geocities,angelfire sites where the tt is not present originally?
If not tell me something to improve such free sites
txs
-
this should work on any site that is spidered it will not bring more search engines to your site but once there it instructs them as to what to index :)
v_Ln
-
Nice tutorial Val. :)
BTW, even if you're not bothered about giving instructions to search engines, you might want to include a blank robots.txt file anyway. I found that every search engine visiting my site tried to request robots.txt, which left a load of 404 errors in my logs (because the file didn't exist) and made me think I had a broken link somewhere. Same goes for a favicon.ico file when people try to bookmark your site.
-
interesting thing, i did a search on google for robots.txt and found that the whitehouse.gov has a robots.txt file listing disallowing some folders from being spidered....