How secure is my custom hashing algorithm? Are there any known vulnerabilities?

Printable View

February 25th, 2025, 06:23 PM
user123127317823618723

How secure is my custom hashing algorithm? Are there any known vulnerabilities?
I've implemented a custom hashing algorithm in Python, and I would like to get feedback on its security. The algorithm involves complex padding, bitwise operations, and a mix of rotation functions. My goal is to understand whether there are any known weaknesses, such as vulnerabilities to brute-force attacks, collision resistance, or weaknesses in its internal state management.

Here?s a simplified overview of how the algorithm works:
- Salt generation: The salt is generated using the length of the input and some constants.
- Padding: The input is padded using a custom padding scheme that mimics wide-pipe padding.
- State initialization: The algorithm initializes an internal state with 32 words of 64 bits each, using constants.
- Processing: The input is processed in blocks, expanding each block and performing numerous rounds of mixing operations using bitwise shifts and XOR.
- Final hash: The final hash is obtained by concatenating the processed state words.
I am specifically concerned with the following potential weaknesses:
- Brute-force resistance: Is my algorithm sufficiently resistant to brute-force attacks?
- Collision resistance: Does my algorithm exhibit any properties that could lead to collision vulnerabilities?
Here is the Python code for the algorithm:

Code:

def circular_left_shift(x, shift, bits=64): return ((x << shift) & ((1 << bits) - 1)) | (x >> (bits - shift)) def circular_right_shift(x, shift, bits=64): return (x >> shift) | ((x << (bits - shift)) & ((1 << bits) - 1)) def ultimate_resistant_hash(text, rounds=128): salt = ((len(text) * 0xF0E1D2C3B4A59687) ^ 0x1234567890ABCDEF) & ((1 << 64) - 1) data = str(salt) + text data_bytes = data.encode('utf-8') block_size = 256 original_bit_length = len(data_bytes) * 8 data_bytes += b'\x80' while (len(data_bytes) % block_size) != (block_size - 32): data_bytes += b'\x00' data_bytes += original_bit_length.to_bytes(32, 'big') state = [ 0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1, 0x510e527fade682d1, 0x9b05688c2b3e6c1f, 0x1f83d9abfb41bd6b, 0x5be0cd19137e2179, 0x243f6a8885a308d3, 0x13198a2e03707344, 0xa4093822299f31d0, 0x082efa98ec4e6c89, 0x452821e638d01377, 0xbe5466cf34e90c6c, 0xc0ac29b7c97c50dd, 0x3f84d5b5b5470917, 0x452821e638d01377, 0xbe5466cf34e90c6c, 0xc0ac29b7c97c50dd, 0x3f84d5b5b5470917, 0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1, 0x510e527fade682d1, 0x9b05688c2b3e6c1f, 0x1f83d9abfb41bd6b, 0x5be0cd19137e2179, 0x243f6a8885a308d3, 0x13198a2e03707344, 0xa4093822299f31d0, 0x082efa98ec4e6c89 ] for i in range(0, len(data_bytes), block_size): block = data_bytes[i:i+block_size] M = [int.from_bytes(block[j*8:(j+1)*8], 'big') for j in range(32)] W = M[:] + [0] * (128 - 32) for t in range(32, 128): s0 = (circular_right_shift(W[t-15], 1) ^ circular_right_shift(W[t-15], 8) ^ (W[t-15] >> 7)) & ((1<<64)-1) s1 = (circular_right_shift(W[t-2], 19) ^ circular_right_shift(W[t-2], 61) ^ (W[t-2] >> 6)) & ((1<<64)-1) W[t] = (W[t-16] + s0 + W[t-7] + s1) & ((1<<64)-1) for t in range(rounds): for j in range(32): a = state[j] b = state[(j+1) % 32] c_val = state[(j+3) % 32] d = state[(j+7) % 32] f_val = (circular_right_shift(a, (j+1) % 64) ^ circular_left_shift(b, (j+3) % 64)) + (c_val & d) m_val = W[(t + j) % 128] temp = (state[j] + f_val + m_val + (t * j) + ((state[(j+5) % 32] * 0x9e3779b97f4a7c15) & ((1<<64)-1))) & ((1<<64)-1) state[j] = temp ^ circular_left_shift(temp, (t + j) % 64) state = state[1:] + state[:1] for j in range(32): state[j] = state[j] ^ M[j % 32] digest = ''.join(format(x, '016x') for x in state) return digest

Can anyone provide insights into the potential vulnerabilities of this algorithm, or suggest improvements? (In this code, I have not included the part where the word is selected or the libraries used for handling files and user input, as they are not relevant to the analysis of the algorithm's functionality.)
May 31st, 2025, 01:50 AM
waywardgeek

Not bad for a learning exercise

I actually teach writing cryptography algorithms from scratch internally at my company, so I've seen a lot of things like this. If you submitted this as your first ever hash function, I'd say you passed the challenge. That said...

You asked how secure it looks. I am concerned about the last step in the round:

state[j] = temp ^ circular_left_shift(temp, (t + j) % 64)

I strongly prefer steps that are reversible, and I see for example your multiplication is by an odd constant, which is reversible. However, this step is not reversible. For example, if the result is 0, then any temp value that is the same when rotated by t + i could result in 0, so it is not reversible. Without that, you'd have a hard time convincing me that you don't rapidly lose entropy while mixing. I recommend instead XORing state in some rotated form into state[j+1 % 32] or something like that, which is clearly reversible. If I wanted to find a collision, I think I would probably attack this point, I haven't tried, but I'd give myself maybe a 50-50 chance of getting that approach to work in an attack on this algorithm.

Other than that, I feel your padding is funky and scary. UTF-8 encoding is not super fast, expands the input data significantly, and is somewhat complex compared to regular padding algorithms. However, I think it works, and do not see any attacks, so it passes the smell test.

I like that you chose 128 rounds. Without any real cryptanalysis from experts, you're probably subject to some pretty bad differential attacks, and if you want to be secure, do a crap ton of rounds! I tyiplically choose 1,000 rounds, but my mixing is not as strong typically as one of your rounds, so I approve of 128.

Obviously, this is very slow compared to standard algorithms like SHA256 or BLAKE2d. Good. That makes sense for a hacked-together hopefully maybe secure algorithm. The digest is on the large side, at 256 bytes (10X longer than sha256).

Overall, not bad for a first try.
May 31st, 2025, 04:33 AM
waywardgeek

Sorry for being super frustrated, but I spent 2 hours replying to this post with a well though out detailed 2-page reply. This one will be much quicker since I don't trust this forum to actually post my work.

I teach writing your own crypto at work, and I've seen a lot of code like this. In short you pass, but I'm still going to PWN it.

The last step in your round function is not reversible. You likely are losing entropy in your state, and if you do enough rounds you may find a repeating pattern in the state. Equally concerning is that I can look for different input messages that result in different temp values, but which have the property that after the rotation and XOR, they collide. This seems like way too much power to give to a hacker even like me.

Overall, I like tis work. It is a solid first effort. I like that you chose 128 rounds, rather than guessing that maybe your round function is very secure, even without any cryptanalysis. This is the key: If you want any chance at security without extensive cryptanalysis, you need to throw a ton of rounds at the problem. Most likely, there is a killer differential attakc that will break many of your rounds. With 128, you have a chance of being secure, other than for the non-invertable last step.

For that last step, just XOR temp on some other state value so it is reversible. That ensures you never lose entropy during hashing.

The padding functin is wonky. Converting to UTF8 will make security folks roll their eyes, but I agree it works. I'm not a fan of your padding scheme, but I see now security flaws.

So, nice work overall, but fix your last rount step. Then... maybe it is secure. It probably will achieve "Secure against Bill", which is a reasonably good first step.
May 31st, 2025, 04:34 AM
waywardgeek

I see my reply once again dissapeard. There's a 1 second flash of text before it all is erased, and maybe that says something useful... I'll just post it again and see if I have better luck:

Sorry for being super frustrated, but I spent 2 hours replying to this post with a well though out detailed 2-page reply. This one will be much quicker since I don't trust this forum to actually post my work.

I teach writing your own crypto at work, and I've seen a lot of code like this. In short you pass, but I'm still going to PWN it.

The last step in your round function is not reversible. You likely are losing entropy in your state, and if you do enough rounds you may find a repeating pattern in the state. Equally concerning is that I can look for different input messages that result in different temp values, but which have the property that after the rotation and XOR, they collide. This seems like way too much power to give to a hacker even like me.

Overall, I like tis work. It is a solid first effort. I like that you chose 128 rounds, rather than guessing that maybe your round function is very secure, even without any cryptanalysis. This is the key: If you want any chance at security without extensive cryptanalysis, you need to throw a ton of rounds at the problem. Most likely, there is a killer differential attakc that will break many of your rounds. With 128, you have a chance of being secure, other than for the non-invertable last step.

For that last step, just XOR temp on some other state value so it is reversible. That ensures you never lose entropy during hashing.

The padding functin is wonky. Converting to UTF8 will make security folks roll their eyes, but I agree it works. I'm not a fan of your padding scheme, but I see now security flaws.

So, nice work overall, but fix your last rount step. Then... maybe it is secure. It probably will achieve "Secure against Bill", which is a reasonably good first step.
May 31st, 2025, 06:50 AM
waywardgeek

This is a test. I already lost a couple of hours of work replying to this, where my reply vanished. Sorry for the test post, if this comes through.
May 31st, 2025, 07:02 AM
waywardgeek

OK, it seems to be working now... I have no idea why my prior posts vanished. Ugh... I hate writing this phreaking post for a 3rd time!

I teach folks how to write their own crypto in an internal class at my company. I see a lot of code like this. I enjoy breaking it. In short, you pass my class with this entry. It's flawed, but a solid first effort.

The main security flaw I see is the last step in your round function where you XOR temp and temp rotated. There are many values of temp that could XOR to 0 for example. This step is not reversible, and I worry your round function loses entropy in each round, leaving the state in a small set of possible states. Also, since clearly different temp values lead to the same outcome, I might try to find hash collisions by abusing this. I'm not sure if it can be done, that would take more work.

To fix it, just XOR the rotated temp value onto some other state value, so it is reversible.

With that fixed, there's plenty to nit pick, such as using UTF-8 encoding to simplify your padding scheme which is ugly. However, you do have a proper mix of rotations, XOR, and addition / multiplication, and the mixing _looks_ reasonable. You also went for 128 rounds rather than something like 10 because it would run faster. That's a good idea if you want to have any hope of your hash function being secure. Cryptanalysts will find differential attacks that break many of your rounds, and especially without any cryptanalysis, your best hope for security is many rounds.

This algorithm is very slow compared to accepted standard functions like sha256. Good for you! I've never once had a _useful_ algorithm created in my class: hopefully some portion of them are actually secure. That's the best we can hope for really.

If you fix the last step in your round function, then I believe it will be "secure against Bill". It isn't a bad security level :)
May 31st, 2025, 07:31 AM
waywardgeek

I thought of one more security issue, which is created when you fix the non-reversible step in your round function, so it's my fault. The XORing in the message at the end does seem to defeat the attack, though. Mixing in W in your round function also defeats it.

OK, it's not really much of a concern....

Anyway, I'd fee better if you only output 512 bit digests, or maybe 1024 bits. That keeps the attacker from knowing the whole state, which blocks length extension attacks. SHA256 is subject to length extension attacks, as well as cache timing attacks, which your algorithm seems to defend against.