MD5 - Not as safe as once believed

***hypronix*** · December 8th, 2004, 10:24 PM

SHA-0 has been already compromised and last I heard SHA-1 was "in the works" so to speak. It is obvious - if yet unpractical - that hash functions are not collision-free [because they have a limited output]. However to get to the point where you can generate files with the same hash... that's something completely different. It won't be long until we'll be able to see carefully engineered apps with NOP sleds placed in such a way that the final hash is the same as the original... but the other modifications of the file would be very operational

Yet for the time being it does seem like a good idea to move to at least SHA-1

***Soda_Popinsky*** · December 8th, 2004, 10:28 PM

So whats ahead then? What algorithms are next in line?

And a quick question, as of now it isn't "possible" for an attacker to say replace a file on a website with that same file + backdoor, and maintain the MD5, right? I haven't sat down with the paper yet but it looks like the differences in fire and ice are how they are encrypted? So for POC's sake, they used a file that would be easy to create another file with the same MD5 (I think they call it the doppelganger) but as of now could they take a program like netcat and create a doppelganger? Or are they just making evidence that it will might soon be possible?

***hogfly*** · December 8th, 2004, 11:36 PM

MD5..while under "attack" is still NIST approved, and therefore should be able to withstand any scrutiny. I don't know if this PoC will begin to really affect anything..it's too early to tell. Has anyone here ever had an MD5 collision? I know I never have.
Obviously SHA-1 or SHA256 is the way to move, but MD5 is still viable as a means of verification.

**netRealm** · December 8th, 2004, 11:41 PM

I'm not very knowledgeable on this subject, but what if we use two different hash methods? If we use both MD5 and SHA-1, on a file, how hard would it be to make modifications that would be undetectable to both? From what I understand, it would be very difficult because a modification that's undectable to one hash method could be detected by the other hash method. Don't know if it would work, but I think redunant security is a good policy.

***hypronix*** · December 9th, 2004, 12:32 AM

Well AFAIk that's why security certificates from sites include both a SHA and RSA [IIRC] signature so as to give more than one way of checking the authenticity. If both hashes are included then it should be exponentially complex [even if, say, a collision for SHA-1 was found] to create a second file different than the original yet with the same hashes.

**gore** · December 9th, 2004, 12:49 AM

And anyone who wants to bad enough, will do it. Imagine if you could trojan Microsoft updates, now are you saying some ass wouldn't LOVE to have that many zombies? They would work on this night and day t do so no matter how hard it may be.

***chsh*** · December 9th, 2004, 02:57 AM

Originally posted here by thehorse13
Seriously though. MD5 collisions are pretty scary (limited output yet infinite input). To better understand my issue, let's say that you have some PWs that are hashed and you are able to mod a PW to match the original MD5 hash, the new PW you set will work so you no longer have to brute force or crack PWs that are MD5 hashed. This is only one example of collisions (other algorythms have the same issue) but think of what will happen when exploits/softwarez come out that allow for quick controlled collisions. Time to look into other algos as Striek suggests.

I think you are failing to recognize the actual scope of the issue. It's more for long strings of data that it might be an issue. It would be difficult to impossible to end up with a string less than 32 bytes long (such as passwords/phrases) that was different source but same MD5 output.
Saying "AHA, MD5 is compromised" when someone can inject a few arbitrary blocks into a file and break it is silly.
If you could jump with enough force you could achieve escape velocity; it doesn't mean it's necessarily going to happen.

Additionally, your dislike for Tripwire seems misguided given that it can use other hashing algorithms.

For those who scoff at such notions, keep in mind what platforms use this exact model for passwords. Our good friends at Cisco and just about every *nix OS on the planet.

Erm, since when do *nixes use MD5 to store passwords? Have I been asleep?

***The Grunt*** · December 9th, 2004, 03:59 AM

Erm, since when do *nixes use MD5 to store passwords? Have I been asleep?

Slack uses it by default IIRC.

***Tim_axe*** · December 9th, 2004, 08:46 AM

phpBB uses (or at least used) MD5 to store password hashes.

Seriously though. MD5 collisions are pretty scary (limited output yet infinite input). To better understand my issue, let's say that you have some PWs that are hashed and you are able to mod a PW to match the original MD5 hash, the new PW you set will work so you no longer have to brute force or crack PWs that are MD5 hashed. This is only one example of collisions (other algorythms have the same issue) but think of what will happen when exploits/softwarez come out that allow for quick controlled collisions. Time to look into other algos as Striek suggests.

Given only an MD5 hash (output), you have no idea what the input is. It could be a hash of a large Word document, or the hash of a password. The problem is that given the output, you don't know what the input was [1]. So, the problem here is that we aren't at the point where we can say "Give me an input that produces hash_x" without knowing the input. Currently these methods analyze the input, and find ways to intermix/change some data to produce the same MD5 by exploiting how the MD5 is computed. We can't work backwords: So as of right now, we would have to somehow know the password we are trying to crack as a prequsite to calculating an input collision to use to get into the system and take over it. When we can acturally use it for this, we already have a better solution (the actual password).

I too agree that the collisions are scarry. But for now we are left out in the cold on the real developments: We only know of two vectors that hash the same, so we can only substitue those two for each other to produce the same MD5 hash, and we can only do this if it happens where MD5 is at its initial state. [2]

Judging from the article, they (Wang & Joux) are working on ways to find similar substitutions in the files that are more predictable and controllable, and possibly having the ability to find a place to plant whatever you want undetected by MD5.

As of right now, you could only find the changes if you were looking for them (SHA-1 hashes compared along with MD5, looking for irregularaties -- stripwire PoC does this) and they can only do something if there is a backdoor/timebomb in the program that is designed to look for this. In the PoC, stripwire looks for these changes (with SHA-1), and upon finding the trigger condition in Fire (specific SHA-1 hash), the timebomb ticks and explodes.

Take a look at the attached examples that show what the current code is limited in. I produced two JPEG image files of the AntiOnline banner. The underlying JPEG image code is the same as I only changed the headers by prepending the example vectors. This has the side effect of causing most programs to not recognize the images as images. If you grab IrfanView, which is able to find JPEG headers later in the file than they should be, it will ignore the first 128bytes that are there only included to fool the MD5 computations. MD5 the two images, and they will have the same MD5. Do a binary comparison; they are different. To make this work, the modified files are 128bytes longer than the orignal JPEG'ed logo. If we could calculate the correct values to put into the file at a later place, we could potentially render the image correctly without changing file size[3]. But we (the general public) do not have the knowledge currently to do what Wang & Joux are currently doing. My examples are just a rather weak attempt at demonstrating what they have so far showed and explained to everyone.

[1] - We know that a 128bit MD5 hash output can represent an infinate number of inputs. Because of MD5's inherent collisions, you can't be 100% sure what the input is because the input you used is just as valid as any other input imaginable (that has the same output).

[2] - Simply put, the two example test vectors can only be substituted for each other at the beginning of the file, where the original file begins with one of them. Thus we can construct a file that begins with them to work, but we're unlikely to come across this given an arbitrary file. The vulnerability itself is applicable to changing virtually anything anywhere, but we've only been provided with rare vectors that were probably considered safe (by happening primarily in controlled situations instead of in the wild) by Wang & Joux.

[3] - Simply, we don't know how to calculate these "doppelganger" blocks. If we could, we could include them in arbitrary files while preserving the MD5 hash. Wang & Joux are leaps and bounds ahead of us here, as they are leading the way by already doing this.

***hypronix*** · December 9th, 2004, 09:02 AM

I don't think it would be at all possible to identify the original data that has been passed through the hash because a hash function has a fixed output length which means data is being mangled around. Take a look at the algorithm's description if you will... I don't think there's a readily available method to reverse it.

Thread: MD5 - Not as safe as once believed

Thread Tools

Display

Posting Permissions