Thice Security

De Star 6, 1601 MH Enkhuizen
Thice Security

Meaningful MD5 Collisions: Creating executables

19/01/2015, by Thice, category Code, CTF, Forensics, Malware

More than two years ago I worked on meaningful MD5 collisions, especially creating executables files, but I never finished my write up about this until now (hurray for having a sabbatical ūüėČ ). The idea behind this project was to create multiple executables with the same MD5, but with different behavior. I ended up creating a Perl script which enables you to create a simple skeleton source code which you can use as a basis for your own code, after compilation you can use the same Perl script to create the multiple executables with different behavior. This project does not show a new way to create MD5 collisions, but makes it easy to exploit the weakness by creating executables with MD5 collisions.¬†I based my project on existing research such as HashClash, and used fastcoll¬†to create the collisions. For further information about MD5 collisions, I would like to refer to HashClash.

The MD5 collision executables can potentially be a security issue for MD5 whitelisting, which is still used by some security products. An attacker could potentially first send an executable which is considered safe and then its counterpart which is evil. Since the files will have the same MD5 hash value the first file will have the second file white-listed. The files could further have impact on products which use MD5 hash values to uniquely identify files, such as certain forensics software.

The whole project was inspired by my first MD5 collision experience while playing SmashTheStack IO and by forensic products using MD5 hash values as unique identifiers for files.

Collision files

The fastcoll software is able to create MD5 collisions strings. These strings will (slightly) differ in contents but will have the same MD5 hash values.

Example of generating collisions with fastcoll:

The files generated by fastcoll in this example are shown below with their differences marked in yellow:

collision1 collision2

Since we now have two different strings with the same MD5 we can use these strings in our own code to create two different files with the same MD5. We can use the collision string to differentiate between the two different files. The collision strings will be different by a couple of bytes, these different bytes can be used to create checks to find out which of the two files is currently running. In the example above we can use the char \xBB as a check (check-byte) if the collision is the first string or the second, when the check-byte is found the executable will then execute a friendly code, and when the check-byte is not found the executable will execute an evil code.


The file layout will look like this:


So, each file contains both the friendly code and the evil code, but only one of the two will be executed depending on the collision found in the file.

To create more than two files we can chain multiple collisions.


The following table shows the amount of collisions needed for the amount of collision files you want to create:

Collisions Amount of files
1 2
2 4
3 8
4 16
5 32
.. ..
12 4096
13 8192

Collision script

The approach described above has been automated in the Perl script in this article. The current code looks for characters which can be found in one of the collisions but not in the other, it would of course also be possible to just check a specific byte.

The first part of the script enables you to generate a collision template (or skeleton code) for the executable.

Output of the script for 3 executable files:

The contents of example3.c:

This code can be used to add your own code to it by replacing the Placeholders. The code contains placeholders for the collisions we will add to the executable in the next part. Before the collisions will be added the code needs to be compiled first. It is important to have no optimization or anything when compiling, since that might screw up the placeholders. During my research I used Dev C++ to compile the code.

A snippet of the compiled skeleton code is shown below which clearly shows the collision placeholders:


To replace the Placeholders with the MD5 collisions we use the Perl script again, the compiled code is used as input, after which the script will generate the collisions:

The Placeholders in the executable are replaced with the newly created MD5 collisions:


If we compare two of the files at the location of a collision we can see the following differences:


The files we generated now have the same MD5:

But will have a different SHA1 hash value:

Executing the created files we can see they have a different output:

The same approach can be used to create a lot more collisions, 5000 for example:

Executable collision Perl script source code

The script is just some proof of concept code which could use some fixes. But since I am not working actively on this project anymore I just decided to release it as is.

When the script is executed without arguments it will show the usage:

The script has only been tested on Windows and will need some (small) changes to work on Linux.

The full script:


In 2013, I created a challenge for ebCTF (BIN400) which involved creating the kind of executable files as named in this write-up. 14 teams managed to solve this question as can be seen on the scoreboard. Challenge description:

Please upload 5 Windows console executable files with the same MD5 but with different printed outputs (file type: MS Windows, PE32 executable, console)

The output for the files should be:
File1: All Eindbazen are wearing wooden shoes
File2: All Eindbazen live in a windmill
File3: All Eindbazen grow their own tulips
File4: All Eindbazen smoke weed all day
File5: All Eindbazen are cheap bastards

NOTE: The goal of this challenge is NOT to attack this webservice, this is NOT a WEB challenge.

So, what do you think ?