Skip to content

Latest commit

 

History

History
47 lines (39 loc) · 4.6 KB

unicoll.md

File metadata and controls

47 lines (39 loc) · 4.6 KB

By Marc Stevens

MD5 collisions

Since a team of researchers led by Xiaoyun Wang disclosed collisions for MD5 in 2004, it has been clear that MD5 is broken. Nevertheless it turned out hard to convince people to stop using MD5. This led to new research in making the MD5 attacks faster and stronger as well as constructing meaningful collisions with realistic attack scenarios, notable highlights are:

  • 2004: First (identical-prefix) collision by Wang et al.: pseudo-random 128-byte collisions. WY05
  • 2004-2006: Various meaningful collision examples: X.509 certificates (Lenstra, de Weger, Wang LdWW05), postscript documents (Daum, Lucks) DL06, software (Kaminsky K04/ Mikle M04 / Selinger Sel06)
  • 2006: First chosen-prefix collison: colliding X.509 certificates with distinct names SLdW07
  • 2009: Short chosen-prefix collision: rogue intermediary Certification Authority certificate SSALMOdW09
  • 2012: FLAME malware exploits unknown chosen-prefix collision attack to obtain code signing certificate for Windows Update S13
  • 2017: toy MD5 hashquines: creating documents that show their own MD5 hash: GIF, PDF, PostScript [twitter]

on Unicoll

The chosen-prefix collision attack is the most powerful collision attack because it allows two distinct arbitrarily chosen prefixes, so one typically only needs to 'hide' the attack generated collision bitstrings in the document. Another benefit is that each colliding document only contains 1 prefix. With available MD5 cryptanalysis tools (HashClash script scripts/cpc.sh, or precompiled GUI with CUDA support) one can create such chosen-prefix collisions in one day. Convenient for 2 colliding files, but not fast enough for tricks such as MD5 hashquines that require at least 128 collision attacks generating 2^128 possible files all with the same hash.

Identical-prefix collisions can be created much faster within a few seconds (using e.g. fastcoll, S06), but these generate pseudo-random looking 128-byte byte strings. The only difference occurs in these collision byte strings which makes it a bit more tricky to exploit these meaningfully in file formats.

To facilitate file format exploits with identical-prefix collisions together with Ange Albertini I've modified hashclash and created a script scripts/poc_no.sh for a special type of a fast identical-prefix collision attack that we've dubbed UniColl. It's special in that it uses a message block difference of just a single bit which is located early in the 10th-byte and actually in its least-significant bit. This is rather unique because all fast collision attack message block differences seem to require at least 3 bit differences, at least one of which on the most-significant bit of a certain byte. Moreover, with UniColl one can completely choose the content of the first 12 bytes which makes it very flexible for a number of file formats, including text-based formats. There are other single-bit collision attacks including some on lower bits of bytes, namely those used for the chosen-prefix collision attack, but these attacks are quite a bit slower.

The script is not perfect and sometimes fails (I will try to make it smarter), in which case I suggest to just retry with a slightly modified prefix (another fix is to set data=200000 in the script to a higher value, say data=1000000).

  • Clone hashclash repo
  • Follow instructions to build hashclash
  • Run [hashclashdir]/scripts/poc_no.sh <prefixfile> with a prefixfile of length N x 64 + K x 4 bytes where K<=3.