File Integrity Tests
Results 1 to 7 of 7

Thread: File Integrity Tests

  1. #1
    Join Date
    May 2002

    Question File Integrity Tests

    I was wondering if someone could verify something for me.
    Do md5 and similar hash checksums work by take the entire contents of the file, whatever it may be, and running it through the algorithm producing a hash? Then when checking the file's integrity, you run the file through the same algorithm and if they are the same, they are the same file? That seems logical to me, but I wanted to make sure that's how that works in programs like 'Tripwire'.


  2. #2
    Master-Jedi-Pimps0r & Moderator thehorse13's Avatar
    Join Date
    Dec 2002
    Washington D.C. area
    MD5 works like this:

    1) You download a hash of File A which is unique to File A.
    2) You download File A and then run the MD5 app again, generating another hash.
    3) If the hashes match, there is an excellent chance the file hasn't been changed since the creation of the original hash.
    Our scars have the power to remind us that our past was real. -- Hannibal Lecter.
    Talent is God given. Be humble. Fame is man-given. Be grateful. Conceit is self-given. Be careful. -- John Wooden

  3. #3
    Senior Member
    Join Date
    Mar 2004

    MD5, SHA-1 an other hashes take the whole file as an input.
    Make a test: Create a MD5 hash of a huge file (like 650 MB),
    change one char using a hex-editor, and create a new MD5 hash.
    Any difference?

    In fact, hashes are one part of the database, Tripwire creates.
    Thus, it is very easy to create a Tripwire-like program:

    - create a table like "filename MD5-hash SHA1-hash"
    using a trusted MD5 and SHA1 generator.
    - store this table on a very secure and different machine or burn
    the table and generators on read-only media.
    - recreate the table from time to time using trusted hash
    generators and check for differences.

    /edit: note, that one crucial ingredient of hashing-algorithms is
    to create completely different hashes, even if the initial files are
    very similar. I predict that the two MD5 hashes of this huge file
    are completely different, even if you only change one char out
    of ~650 Million! This is one reason, why MD5 and SHA1 became

    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  4. #4
    () \/V |\| 3 |) |3\/ |\|3G47|\/3
    Join Date
    Sep 2002
    Originally posted here by thehorse13
    3) If the hashes match, there is an excellent chance the file hasn't been changed since the creation of the original hash.
    I've been writing some programs in C# with method GetHashCode which uses MD5. As you mentioned in your 3rd point "there is an excellent chance "... My books indicate that an exact match is not guaranteed even if the file has not been changed. Why is that? And do you know what the odds are of getting a hash value that is the same when the files have been changed...or the other way around...getting different hash values when the files are indeed the same?

    Go Finland!
    Deviant Gallery

  5. #5

  6. #6
    Senior Member
    Join Date
    Jun 2003
    So there is the abiltity to potentially crack and MD5 as well as SHA-1.
    But there are some file integrity programs that do both.

    Would you say its theoretically impossible to be able to spoof a file that has a unique MD5 and SHA-1 hash?

    I mean while you can pad/change/or whatever to produce a matching cryptographic sum with one algorithm the other algorithm does not use the same mathematical proceedure. So you would have to find something that satisfied both.
    That which does not kill me makes me stronger -- Friedrich Nietzche

  7. #7
    Senior Member
    Join Date
    Mar 2004

    Sorry for the late attempt to answer your question.

    In short: I am pretty sure by looking at the pseudocode of SHA1[1] and MD5[2],
    that there is no ambiguity in the algorithm such that the identical file can produce
    two different outputs. Of course, there are collisions (see below), but this is
    in the nature of the hash-function - and a completely different issue! There is
    also a good Q&A[3] available.

    The odds:
    - probability=0 for two different outputs based on the same "message"

    - MD5: A collision (find 2 messages with the same hash) can be found within a few
    hours, but there is not (yet?) a pre-image attack (find a collision for a given hash).
    The odds for both cases are completely different - nevertheless: stripwire seems to
    show that code-blocks can be altered without altering the hash, which renders
    MD5 untrustworthy.

    - SHA1: A collision can be found using supercomputers. Again, there is no
    pre-image attack known. Note, that it is said, that a hash has been broken,
    when an attack is known, which is more efficient than brute-force.

    A bit more lengthy (and somewhat mathematical )

    The first issue has been answered: Is there a chance for a collision? Yes, because
    the hash-function is not injective. The number of elements in the domain is literally
    infinite, while the codomain has a finite number of elements. How probable is it?
    It depends - is it a (second) pre-image attack or is it a collision (find a pair of messages
    which hashes are identical). The security of a hash function is defined via the probability
    of finding a collision, since this is much simpler (see and follow the link provided by
    Negative ). A very good, general read, which defines the terms used here in a clear way[4].

    The second issue has to do with a completely different property of the hash-function:
    Its non-ambiguity. The very function of a hash-function is to detect changes in the
    file. A good hash-function behaves "chaotic": A small change in the message results
    in a large change in the hash. I cannot prove that there is no ambiguity in the main
    hash-functions like MD5 and SHA1, but: The pseudo-code has no "randomness"
    in it, as far as I can see. It is a well-defined algorithm, and therefore very much likely

    Could you provide me a link to the book, which is indicating such a thing?

    S3cur|ty4ng31, the odds for a pre-image attack even for the MD5 is pretty
    low, and still impossible for the SHA-1. I would say it is very unlikely to find
    collisions for both hashes at the same time - but I do not know a proof that
    it is impossible. With todays paranoia, people claim that even such a double-hash
    will eventually be broken.


    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts