Results 1 to 4 of 4

Thread: How long does search engine cache last?

  1. #1
    Junior Member
    Join Date
    Aug 2005

    How long does search engine cache last?

    Not sure if this is the right forum to post under, but I was wondering if anyone knows the answer: when you delete a web page, it still exists as a search result because the page has been cached by the search engine (e.g. google). How much time must pass before it will no longer show up as a search result and effectively "disappear"?


  2. #2
    Senior Member
    Join Date
    Mar 2004

    A search engine, e.g. google, takes a snapshot of the webpage, while indexing.
    When you remove or shut down your webpage, and you want its content removed
    from the cache, you should contact the search engine, e.g. google[1].

    But, there are other ways: You can prevent google from returning cached versions,
    by using the NOARCHIVE[2] meta-tag. During the index update, this will become
    active. The frequency of the spider visiting your page however, depends on a number
    of parameters and is not fixed. Furthermore, you can restrict spiders by configuring
    robots.txt[3] on your server.

    Note, that waybackmachine.org[4] may have a lot of snapshots stored...
    By some settings in the robots.txt file, you can remove those entries,
    however, read the FAQ on the page:
    User-agent: ia_archiver
    Disallow: /

    [1] http://www.google.com/terms_of_service.html
    [2] http://www.i18nguy.com/markup/metatags.html
    [3] http://www.searchengineworld.com/rob...s_tutorial.htm
    [4] http://www.waybackmachine.org/
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  3. #3
    Junior Member
    Join Date
    Aug 2005
    That was helpful, but I have a couple more q's:

    Is it possible to create a robots.txt if my website is a crappy www.geocities.com/blahblah type website?

    If so, how exactly do I create a robots.txt file? Can I just open up notepad in Windows and type the following stuff, then save the .txt file and put it in my home directory?

    User-agent: *

    Disallow: /

    I do I need some sort of special program? I have no programming experience (only basic html skills).


  4. #4
    Leftie Linux Lover the_JinX's Avatar
    Join Date
    Nov 2001
    Beverwijk Netherlands
    Yup.. that's it..

    just open notepad enter
    User-agent: *
    Disallow: /
    and save as robots.txt

    as you could have read in link [3] in sec_ware's post

    And the host doesn't matter.. geocity's or some realy 1337 hosting provider..
    Should all work..

    Note. . a robots.txt file doesn't actualy protect you from mallicious bots.. It does however stop legitimate (google, msn, archive.org etc) bots from indexing you..

    Even more on the robots.txt here: http://www.robotstxt.org/
    ASCII stupid question, get a stupid ANSI.
    When in Russia, pet a PETSCII.

    Get your ass over to SLAYRadio the best station for C64 Remixes !

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts