Hash collision
From Wikipedia, the free encyclopedia
In computer science, a hash collision or hash clash is a situation that occurs when two distinct inputs into a hash function produce identical outputs.
All hash functions have potential collisions, though with a well-designed hash function, collisions should occur less often (compared with a poorly designed function) or be more difficult to find. In certain specialized applications where a relatively small number of possible inputs are all known ahead of time it is possible to construct a perfect hash function which maps all inputs to different outputs. However, many hash functions, including most cryptographic hash functions, produce a fixed size output from an arbitrarily long message. In such a design, there will always be collisions, because any given hash has to correspond to a very large number of possible inputs.
Contents |
[edit] In searching
An efficient method of searching can be to process a lookup key using a hash function, then take the resulting hash value and then use it as an index into an array of data. The resulting data structure is called a hash table. As long as different keys map to different indices, lookup can be performed in constant time. When multiple lookup keys are mapped to identical indices, however, a hash collision occurs. The most popular ways of dealing with this are chaining (building a linked list of values for each array index), and open addressing (searching other array indices nearby for an empty space). Both of these, however, degrade the worst-case lookup complexity to linear time of the number of elements.
[edit] Collision resistance
Given: A hash function H, two passwords x and y.
Weak collision resistance: for a given x, it is hard to find a such that H(x) = H(y). A user inputs a value, in this example a password, called initial value (x). If the hash function H is weakly collision resistant, the probability of finding a second password with the same hash value as the initial one is negligible in the output length of the hash function.
Strong collision resistance: it is hard to find any x and y such that H(x) = H(y). If the hash function H is strongly collision resistant, the probability of finding any two passwords with the same hash value is negligible in the output length of the hash function.
[edit] In cryptography
One desirable property of cryptographic hash functions is that it is computationally infeasible to find a collision. The value of a hash function can be used to certify an input is unchanged by publishing the signed value of the hash if it is not feasible to produce a collision. Feasible in this context refers to any algorithm with an asymptotic running time polynomial in the output length of the hash function, which is usually much faster than a brute-force birthday attack.
The process of finding two arbitrary values whose hashes collide is called a collision attack; the process of finding one arbitrary value whose hash collides with another, given hash is called a preimage attack. A successful preimage attack is a much more serious break than a successful collision attack.
[edit] See also
[edit] References
- http://www.cryptography.com/cnews/hash.html
- http://www.cits.rub.de/MD5Collisions/
- http://eprint.iacr.org/2005/425.pdf - Improved Collision Attack on Hash Function MD5, very technical.
[edit] External links
|