-
August 18th, 2005, 12:24 PM
#1
Junior Member
How long does search engine cache last?
Not sure if this is the right forum to post under, but I was wondering if anyone knows the answer: when you delete a web page, it still exists as a search result because the page has been cached by the search engine (e.g. google). How much time must pass before it will no longer show up as a search result and effectively "disappear"?
Thanks.
-
August 18th, 2005, 01:18 PM
#2
Hi
A search engine, e.g. google, takes a snapshot of the webpage, while indexing.
When you remove or shut down your webpage, and you want its content removed
from the cache, you should contact the search engine, e.g. google[1].
But, there are other ways: You can prevent google from returning cached versions,
by using the NOARCHIVE[2] meta-tag. During the index update, this will become
active. The frequency of the spider visiting your page however, depends on a number
of parameters and is not fixed. Furthermore, you can restrict spiders by configuring
robots.txt[3] on your server.
Note, that waybackmachine.org[4] may have a lot of snapshots stored...
By some settings in the robots.txt file, you can remove those entries,
however, read the FAQ on the page:
Code:
User-agent: ia_archiver
Disallow: /
Cheers
[1] http://www.google.com/terms_of_service.html
[2] http://www.i18nguy.com/markup/metatags.html
[3] http://www.searchengineworld.com/rob...s_tutorial.htm
[4] http://www.waybackmachine.org/
If the only tool you have is a hammer, you tend to see every problem as a nail.
(Abraham Maslow, Psychologist, 1908-70)
-
August 19th, 2005, 09:27 AM
#3
Junior Member
That was helpful, but I have a couple more q's:
Is it possible to create a robots.txt if my website is a crappy www.geocities.com/blahblah type website?
If so, how exactly do I create a robots.txt file? Can I just open up notepad in Windows and type the following stuff, then save the .txt file and put it in my home directory?
User-agent: *
Disallow: /
I do I need some sort of special program? I have no programming experience (only basic html skills).
Thanks.
-
August 19th, 2005, 01:16 PM
#4
Yup.. that's it..
just open notepad enter
Code:
User-agent: *
Disallow: /
and save as robots.txt
as you could have read in link [3] in sec_ware's post
And the host doesn't matter.. geocity's or some realy 1337 hosting provider..
Should all work..
Note. . a robots.txt file doesn't actualy protect you from mallicious bots.. It does however stop legitimate (google, msn, archive.org etc) bots from indexing you..
Even more on the robots.txt here: http://www.robotstxt.org/
ASCII stupid question, get a stupid ANSI.
When in Russia, pet a PETSCII.
Get your ass over to SLAYRadio the best station for C64 Remixes !
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|