How secure is my custom hashing algorithm? Are there any known vulnerabilities?
I've implemented a custom hashing algorithm in Python, and I would like to get feedback on its security. The algorithm involves complex padding, bitwise operations, and a mix of rotation functions. My goal is to understand whether there are any known weaknesses, such as vulnerabilities to brute-force attacks, collision resistance, or weaknesses in its internal state management.
Here?s a simplified overview of how the algorithm works:
- Salt generation: The salt is generated using the length of the input and some constants.
- Padding: The input is padded using a custom padding scheme that mimics wide-pipe padding.
- State initialization: The algorithm initializes an internal state with 32 words of 64 bits each, using constants.
- Processing: The input is processed in blocks, expanding each block and performing numerous rounds of mixing operations using bitwise shifts and XOR.
- Final hash: The final hash is obtained by concatenating the processed state words.
I am specifically concerned with the following potential weaknesses:
- Brute-force resistance: Is my algorithm sufficiently resistant to brute-force attacks?
- Collision resistance: Does my algorithm exhibit any properties that could lead to collision vulnerabilities?
Here is the Python code for the algorithm:
Code:
def circular_left_shift(x, shift, bits=64):
return ((x << shift) & ((1 << bits) - 1)) | (x >> (bits - shift))
def circular_right_shift(x, shift, bits=64):
return (x >> shift) | ((x << (bits - shift)) & ((1 << bits) - 1))
def ultimate_resistant_hash(text, rounds=128):
salt = ((len(text) * 0xF0E1D2C3B4A59687) ^ 0x1234567890ABCDEF) & ((1 << 64) - 1)
data = str(salt) + text
data_bytes = data.encode('utf-8')
block_size = 256
original_bit_length = len(data_bytes) * 8
data_bytes += b'\x80'
while (len(data_bytes) % block_size) != (block_size - 32):
data_bytes += b'\x00'
data_bytes += original_bit_length.to_bytes(32, 'big')
state = [
0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1,
0x510e527fade682d1, 0x9b05688c2b3e6c1f, 0x1f83d9abfb41bd6b, 0x5be0cd19137e2179,
0x243f6a8885a308d3, 0x13198a2e03707344, 0xa4093822299f31d0, 0x082efa98ec4e6c89,
0x452821e638d01377, 0xbe5466cf34e90c6c, 0xc0ac29b7c97c50dd, 0x3f84d5b5b5470917,
0x452821e638d01377, 0xbe5466cf34e90c6c, 0xc0ac29b7c97c50dd, 0x3f84d5b5b5470917,
0x6a09e667f3bcc908, 0xbb67ae8584caa73b, 0x3c6ef372fe94f82b, 0xa54ff53a5f1d36f1,
0x510e527fade682d1, 0x9b05688c2b3e6c1f, 0x1f83d9abfb41bd6b, 0x5be0cd19137e2179,
0x243f6a8885a308d3, 0x13198a2e03707344, 0xa4093822299f31d0, 0x082efa98ec4e6c89
]
for i in range(0, len(data_bytes), block_size):
block = data_bytes[i:i+block_size]
M = [int.from_bytes(block[j*8:(j+1)*8], 'big') for j in range(32)]
W = M[:] + [0] * (128 - 32)
for t in range(32, 128):
s0 = (circular_right_shift(W[t-15], 1) ^ circular_right_shift(W[t-15], 8) ^ (W[t-15] >> 7)) & ((1<<64)-1)
s1 = (circular_right_shift(W[t-2], 19) ^ circular_right_shift(W[t-2], 61) ^ (W[t-2] >> 6)) & ((1<<64)-1)
W[t] = (W[t-16] + s0 + W[t-7] + s1) & ((1<<64)-1)
for t in range(rounds):
for j in range(32):
a = state[j]
b = state[(j+1) % 32]
c_val = state[(j+3) % 32]
d = state[(j+7) % 32]
f_val = (circular_right_shift(a, (j+1) % 64) ^ circular_left_shift(b, (j+3) % 64)) + (c_val & d)
m_val = W[(t + j) % 128]
temp = (state[j] + f_val + m_val + (t * j) + ((state[(j+5) % 32] * 0x9e3779b97f4a7c15) & ((1<<64)-1))) & ((1<<64)-1)
state[j] = temp ^ circular_left_shift(temp, (t + j) % 64)
state = state[1:] + state[:1]
for j in range(32):
state[j] = state[j] ^ M[j % 32]
digest = ''.join(format(x, '016x') for x in state)
return digest
Can anyone provide insights into the potential vulnerabilities of this algorithm, or suggest improvements? (In this code, I have not included the part where the word is selected or the libraries used for handling files and user input, as they are not relevant to the analysis of the algorithm's functionality.)
Not bad for a learning exercise
I actually teach writing cryptography algorithms from scratch internally at my company, so I've seen a lot of things like this. If you submitted this as your first ever hash function, I'd say you passed the challenge. That said...
You asked how secure it looks. I am concerned about the last step in the round:
state[j] = temp ^ circular_left_shift(temp, (t + j) % 64)
I strongly prefer steps that are reversible, and I see for example your multiplication is by an odd constant, which is reversible. However, this step is not reversible. For example, if the result is 0, then any temp value that is the same when rotated by t + i could result in 0, so it is not reversible. Without that, you'd have a hard time convincing me that you don't rapidly lose entropy while mixing. I recommend instead XORing state in some rotated form into state[j+1 % 32] or something like that, which is clearly reversible. If I wanted to find a collision, I think I would probably attack this point, I haven't tried, but I'd give myself maybe a 50-50 chance of getting that approach to work in an attack on this algorithm.
Other than that, I feel your padding is funky and scary. UTF-8 encoding is not super fast, expands the input data significantly, and is somewhat complex compared to regular padding algorithms. However, I think it works, and do not see any attacks, so it passes the smell test.
I like that you chose 128 rounds. Without any real cryptanalysis from experts, you're probably subject to some pretty bad differential attacks, and if you want to be secure, do a crap ton of rounds! I tyiplically choose 1,000 rounds, but my mixing is not as strong typically as one of your rounds, so I approve of 128.
Obviously, this is very slow compared to standard algorithms like SHA256 or BLAKE2d. Good. That makes sense for a hacked-together hopefully maybe secure algorithm. The digest is on the large side, at 256 bytes (10X longer than sha256).
Overall, not bad for a first try.