August 18th, 2005, 12:24 PM
How long does search engine cache last?
Not sure if this is the right forum to post under, but I was wondering if anyone knows the answer: when you delete a web page, it still exists as a search result because the page has been cached by the search engine (e.g. google). How much time must pass before it will no longer show up as a search result and effectively "disappear"?
August 18th, 2005, 01:18 PM
A search engine, e.g. google, takes a snapshot of the webpage, while indexing.
When you remove or shut down your webpage, and you want its content removed
from the cache, you should contact the search engine, e.g. google.
But, there are other ways: You can prevent google from returning cached versions,
by using the NOARCHIVE meta-tag. During the index update, this will become
active. The frequency of the spider visiting your page however, depends on a number
of parameters and is not fixed. Furthermore, you can restrict spiders by configuring
robots.txt on your server.
Note, that waybackmachine.org may have a lot of snapshots stored...
By some settings in the robots.txt file, you can remove those entries,
however, read the FAQ on the page:
If the only tool you have is a hammer, you tend to see every problem as a nail.
(Abraham Maslow, Psychologist, 1908-70)
August 19th, 2005, 09:27 AM
That was helpful, but I have a couple more q's:
Is it possible to create a robots.txt if my website is a crappy www.geocities.com/blahblah type website?
If so, how exactly do I create a robots.txt file? Can I just open up notepad in Windows and type the following stuff, then save the .txt file and put it in my home directory?
I do I need some sort of special program? I have no programming experience (only basic html skills).
August 19th, 2005, 01:16 PM
Yup.. that's it..
just open notepad enter
and save as robots.txt
as you could have read in link  in sec_ware's post
And the host doesn't matter.. geocity's or some realy 1337 hosting provider..
Should all work..
Note. . a robots.txt file doesn't actualy protect you from mallicious bots.. It does however stop legitimate (google, msn, archive.org etc) bots from indexing you..
Even more on the robots.txt here: http://www.robotstxt.org/
ASCII stupid question, get a stupid ANSI.
When in Russia, pet a PETSCII.
Get your ass over to SLAYRadio
the best station for C64 Remixes !