Meaningful MD5 Collisions: Creating executables

Categories Code, CTF, Forensics, Malware

More than two years ago I worked on meaningful MD5 collisions, especially creating executables files, but I never finished my write up about this until now (hurray for having a sabbatical 😉 ). The idea behind this project was to create multiple executables with the same MD5, but with different behavior. I ended up creating a Perl script which enables you to create a simple skeleton source code which you can use as a basis for your own code, after compilation you can use the same Perl script to create the multiple executables with different behavior. This project does not show a new way to create MD5 collisions, but makes it easy to exploit the weakness by creating executables with MD5 collisions. I based my project on existing research such as HashClash, and used fastcoll to create the collisions. For further information about MD5 collisions, I would like to refer to HashClash.

The MD5 collision executables can potentially be a security issue for MD5 whitelisting, which is still used by some security products. An attacker could potentially first send an executable which is considered safe and then its counterpart which is evil. Since the files will have the same MD5 hash value the first file will have the second file white-listed. The files could further have impact on products which use MD5 hash values to uniquely identify files, such as certain forensics software.

The whole project was inspired by my first MD5 collision experience while playing SmashTheStack IO and by forensic products using MD5 hash values as unique identifiers for files.

Collision files

The fastcoll software is able to create MD5 collisions strings. These strings will (slightly) differ in contents but will have the same MD5 hash values.

Example of generating collisions with fastcoll:

C:\>fastcoll_v1.0.0.5.exe -o file1 file2
MD5 collision generator v1.5
by Marc Stevens (http://www.win.tue.nl/hashclash/)

Using output filenames: 'file1' and 'file2'
Using initial value: 0123456789abcdeffedcba9876543210

Generating first block: .
Generating second block: W..
Running time: 0.973 s

The files generated by fastcoll in this example are shown below with their differences marked in yellow:

collision1 collision2

Since we now have two different strings with the same MD5 we can use these strings in our own code to create two different files with the same MD5. We can use the collision string to differentiate between the two different files. The collision strings will be different by a couple of bytes, these different bytes can be used to create checks to find out which of the two files is currently running. In the example above we can use the char \xBB as a check (check-byte) if the collision is the first string or the second, when the check-byte is found the executable will then execute a friendly code, and when the check-byte is not found the executable will execute an evil code.

overview3.1

The file layout will look like this:

FileLayout

So, each file contains both the friendly code and the evil code, but only one of the two will be executed depending on the collision found in the file.

To create more than two files we can chain multiple collisions.

overview2.1

The following table shows the amount of collisions needed for the amount of collision files you want to create:

Collisions Amount of files
1 2
2 4
3 8
4 16
5 32
.. ..
12 4096
13 8192

Collision script

The approach described above has been automated in the Perl script in this article. The current code looks for characters which can be found in one of the collisions but not in the other, it would of course also be possible to just check a specific byte.

The first part of the script enables you to generate a collision template (or skeleton code) for the executable.

Output of the script for 3 executable files:

C:\>perl Collider.pl -c 3 example3.c

[*] Meaningful MD5 Collision executable generator
 Thijs 'Thice' Bosschert
 v0.4 - 16-02-2012

[*] Creating collision template example3.c
[*] Done! Time needed: 0 seconds

The contents of example3.c:

int main () {
  char output[1];

  // Collision space holder 1
  char collision1[] = "AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00AA00";
  // Collision check character space holder 1
  char checkchar1[] = "\x01"; 
  // Check if collision contains check character
  if (strchr(collision1,checkchar1[0])) {
     output[0] = '0';
  } else {
     output[0] = '1';
  }

  // Random (0-9a-zA-Z) buffer data
  char buffer1[] = "Woalv8RPSs5GEIPGHI5dYedxPTS8HXf3LAzz1CXQuqeXD3ms8xSTI8F2lQ4IWfOILMujPgFt3f0Vou561yKzYaWaMepCrbUslqBjBfeuomzxrYSjm6jHDZxMEMHXS55l2c2U3FfsGDpUtjxqPb8GphxNVLKm8pB9jE9IOXASdW7R2BYp2HObacqx";

  
  // Collision space holder 2
  char collision2[] = "AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01AA01";
  // Collision check character space holder 2
  char checkchar2[] = "\x02"; 
  // Check if collision contains check character
  if (strchr(collision2,checkchar2[0])) {
     output[1] = '0';
  } else {
     output[1] = '1';
  }

  // Random (0-9a-zA-Z) buffer data
  char buffer2[] = "WvJBPdgZakYEEhubLkvz9j6KzSlmL1ogktznPJQoSjk8FwaZvVg2gBOFjzu2db1QnfUOwh7vPLZgmUlq4wFIlO5psHvOb6TMS7LcSePGcEIFqavEfhqGCGnXCFGf6I8vffYp65oUTZe34VToizABPhyAdKoQbzooXJRAbxoTR6g4nbgwOXG8Wf3l";

  // Output something different for each outcome

  if (strstr(output,"00")) { printf("[!] Placeholder 1!\n"); }
  if (strstr(output,"01")) { printf("[!] Placeholder 2!\n"); }
  if (strstr(output,"10")) { printf("[!] Placeholder 3!\n"); }

  char amount[] = "#####3#####";
  return 0;
}

This code can be used to add your own code to it by replacing the Placeholders. The code contains placeholders for the collisions we will add to the executable in the next part. Before the collisions will be added the code needs to be compiled first. It is important to have no optimization or anything when compiling, since that might screw up the placeholders. During my research I used Dev C++ to compile the code.

A snippet of the compiled skeleton code is shown below which clearly shows the collision placeholders:

Placeholder

To replace the Placeholders with the MD5 collisions we use the Perl script again, the compiled code is used as input, after which the script will generate the collisions:

C:\>perl Collider.pl  -b example3.exe example3_done.exe

[*] Meaningful MD5 Collision executable generator
    Thijs 'Thice' Bosschert
    v0.4 - 16-02-2012

[*] Amount of output files to be generated: 3
[*] Amount of collisions needed: 2
[*] Starting creating collisions
[*] Creating collisions for step 1
[!] Found \x00, restarting collision for step 1.
[*] Creating collisions for step 1
[*] Found check char: \xff
[*] Creating collisions for step 2
[*] Found check char: \xf4
[*] Writing 3 collision files
[*] Done! Time needed: 49 seconds

The Placeholders in the executable are replaced with the newly created MD5 collisions:

Placeholder2

If we compare two of the files at the location of a collision we can see the following differences:

Placeholder3

The files we generated now have the same MD5:

C:\>md5deep64.exe example3_done.exe.*
f691d5a79a63397ca8a9cdc5d3d9dd7c  C:\example3_done.exe.01
f691d5a79a63397ca8a9cdc5d3d9dd7c  C:\example3_done.exe.10
f691d5a79a63397ca8a9cdc5d3d9dd7c  C:\example3_done.exe.00

But will have a different SHA1 hash value:

C:\>sha1deep64.exe example3_done.exe.*
6fa71cc3d1a8c83e3fcfa32dfdf6a1d92b5b9657  C:\example3_done.exe.00
e199d2e84084a538afe8feb0eae09abd5182f72e  C:\example3_done.exe.10
ed7a914701e413ec060fd798a93236e69a6fef86  C:\example3_done.exe.01

Executing the created files we can see they have a different output:

C:\>example3_done.exe.00
[!] Placeholder 1!

C:\>example3_done.exe.01
[!] Placeholder 2!

C:\>example3_done.exe.10
[!] Placeholder 3!

The same approach can be used to create a lot more collisions, 5000 for example:

C:\>Collider.pl -b 5000.exe 5000.exe_

[*] Meaningful MD5 Collision executable generator
    Thijs 'Thice' Bosschert
    v0.4 - 16-02-2012

[*] Amount of output files to be generated: 5000
[*] Amount of collisions needed: 13
[*] Starting creating collisions
[*] Creating collisions for step 1
[!] Found \x00, restarting collision for step 1.
[*] Creating collisions for step 1
[*] Found check char: \xd5
[*] Creating collisions for step 2
[!] Found \x00, restarting collision for step 2.
[*] Creating collisions for step 2
[!] Found \x00, restarting collision for step 2.
[*] Creating collisions for step 2
[*] Found check char: \xe1
[*] Creating collisions for step 3
[*] Found check char: \x6d
[*] Creating collisions for step 4
[*] Found check char: \x2e
[*] Creating collisions for step 5
[*] Found check char: \x1e
[*] Creating collisions for step 6
[*] Found check char: \x02
[*] Creating collisions for step 7
[*] Found check char: \x6a
[*] Creating collisions for step 8
[*] Found check char: \xa4
[*] Creating collisions for step 9
[*] Found check char: \x04
[*] Creating collisions for step 10
[!] Found \x00, restarting collision for step 10.
[*] Creating collisions for step 10
[*] Found check char: \x87
[*] Creating collisions for step 11
[!] Found \x00, restarting collision for step 11.
[*] Creating collisions for step 11
[!] Found \x00, restarting collision for step 11.
[*] Creating collisions for step 11
[*] Found check char: \x83
[*] Creating collisions for step 12
[*] Found check char: \x69
[*] Creating collisions for step 13
[*] Found check char: \x15
[*] Writing 5000 collision files
[*] Done! Time needed: 971 seconds

Executable collision Perl script source code

The script is just some proof of concept code which could use some fixes. But since I am not working actively on this project anymore I just decided to release it as is.

When the script is executed without arguments it will show the usage:

C:\>perl Collider.pl

[*] Meaningful MD5 Collision executable generator
    Thijs 'Thice' Bosschert
    v0.4 - 16-02-2012

[*] Usage:
      perl Collider.pl <option> <variables>

[*] Options:
      -c : Create new C template
           -c <amount> <output filename>
      -b : Build hashcollision files
           -b <input filename> <output filename>
[*] Examples:
      perl Collider.pl -c 16 output_file.c
      perl Collider.pl -b input_file.exe output_file.exe

The script has only been tested on Windows and will need some (small) changes to work on Linux.

The full script:

#!/usr/bin/perl -w

# Known bug: not handling 2, 4, 8 etc correctly

$fastcoll = "fastcoll_v1.0.0.5.exe";

# Print header
print "\n[*] Meaningful MD5 Collision executable generator\n".
      "    Thijs 'Thice' Bosschert\n".
      "    v0.4 - 16-02-2012\n\n";

# Check command line arguments
if (!$ARGV[0] || !$ARGV[1] || !$ARGV[2]) {
	error(0);
}

# Option -c : Create new C template
if ($ARGV[0] eq "-c") {

  # Option -c <amount> <output filename> 
  # Check if amount argument is an number
  if ($ARGV[1] !~ m/^[0-9]*$/) {
    error(3,$ARGV[1]);
  }
  $amount = $ARGV[1];
  # Calculating needed collisions
  $tmpcnt = "1";
  while ($tmpcnt < $amount) { 
  	$tmpcnt*=2; $size++;
  }

  # Check if output file already exists
  if (-e $ARGV[2]) {
	  error(1,$ARGV[2]);
  } 
 
  # Open output file
  open(OUTFILE,">>".$ARGV[2]) || error(4,$ARGV[2]);
  print "[*] Creating collision template ".$ARGV[2]."\n";

  # Writing first part of outputfile
  print OUTFILE "int main () {
  char output[".($size-1)."];\n";

  # Writing collision templates to output file
  for($count=1;$count<=$size;$count++) {
  print OUTFILE "
  // Collision space holder ".$count."
  char collision".$count."[] = \"".pattern($count-1)."\";
  // Collision check character space holder ".$count."
  char checkchar".$count."[] = \"\\x".sprintf("%.2x", $count)."\"; 
  // Check if collision contains check character
  if (strchr(collision".$count.",checkchar".$count."[0])) {
     output[".($count-1)."] = '0';
  } else {
     output[".($count-1)."] = '1';
  }

  // Random (0-9a-zA-Z) buffer data
  char buffer".$count."[] = \"".random()."\";

  ";
  } 

  # Writing different output options to output file
  print OUTFILE "// Output something different for each outcome\n\n";
  # Count in binary
  $bincnt = sprintf("%b", $amount-1);
  for($count=1;$count<=$amount;$count++) {
    print OUTFILE "  if (strstr(output,\"".sprintf("%.".length($bincnt)."b", $count-1)."\")) { printf(\"[!] Placeholder ".$count."!\\n\"); }\n";
	}
  
  # print end of output file including amount
  print OUTFILE "
  char amount[] = \"#####".$amount."#####\"\;
  return 0;\n}\n";
  print "[*] Done! Time needed: ".(time - $^T)." seconds\n\n";
  exit;
}

# Option -b : Build hashcollision files
if ($ARGV[0] eq "-b") {
  # Check if input file exists
  if (!-e $ARGV[1]) {
	  error(2,$ARGV[1]);
  }
  # Check if output file exists
  if (-e $ARGV[2]) {
	  error(1,$ARGV[2]);
  }
  
  # Open the input file
  open (INPUTFILE,$ARGV[1]) || error(7,$ARGV[1]);
  local $/;
  binmode INPUTFILE;
  $inputfile = <INPUTFILE>;
  $workfile = $inputfile;
  $count = "0";
  if ($workfile =~ /\#\#\#\#\#([0-9]*)\#\#\#\#\#/s) { $amount = $1; } else { error(8); }
  print "[*] Amount of output files to be generated: ".$amount."\n";
  # Calculating needed collisions
  $tmpcnt = "1";
  while ($tmpcnt < $amount) { 
  	$tmpcnt*=2; $size++;
  }
  print "[*] Amount of collisions needed: ".$size."\n";
	print "[*] Starting creating collisions\n";
  while (!$stop) {
    # Split file before the pattern
    $pat = pattern($count);
    if ($workfile =~ /(.*)$pat/s) {
    	print "[*] Creating collisions for step ".($count+1)."\n";
    	$header = $1; 
    	# Write header to temp file
    	if (-e "MD5_collider_tmp_header") { error(5); } 
    	open(TMPFILE,">>MD5_collider_tmp_header") || error(6);
    	binmode TMPFILE; 
    	print TMPFILE $header;
    	close(TMPFILE); 
    	# Execute fastcol
    	$fastcollout = `$fastcoll -q -p MD5_collider_tmp_header -o MD5_collider_tmp_collision_1 MD5_collider_tmp_collision_2`;
      if ($fastcollout !~ "Generating second block") { error(10); }
    	open (TMPFILE1,"MD5_collider_tmp_collision_1");
    	local $/;
    	binmode TMPFILE1;
    	$collision1 = <TMPFILE1>;
    	close(TMPFILE1);
    	if ($collision1 =~ /(.{128})$/s) { $collision[0] = $1; } 
    	open (TMPFILE2,"MD5_collider_tmp_collision_2");
    	local $/;
    	binmode TMPFILE2;
    	$collision2 = <TMPFILE2>;
    	close(TMPFILE2);
    	if ($collision2 =~ /(.{128})$/s) { $collision[1] = $1; }
    	if ($collision[0] !~ /\x00/ && $collision[1] !~ /\x00/) {
        # Checking which characters differ in the collision
        @collision1 = (map { sprintf("%.2x",ord($_)) } split (//, $collision[0]));
	      $collisionhex = $collision[1];
	      $collisionhex =~ s/(.)/sprintf("%.2x.",ord($1))/eg;
        if ($collisionhex !~ $collision1[19]) {
        	$step{$count}{'char'} = $collision1[19];
        } elsif ($collisionhex !~ $collision1[45]) {
          $step{$count}{'char'} = $collision1[45];
        } elsif ($collisionhex !~ $collision1[59]) {
        	$step{$count}{'char'} = $collision1[59];
        } elsif ($collisionhex !~ $collision1[83]) {
        	$step{$count}{'char'} = $collision1[83];
        } elsif ($collisionhex !~ $collision1[109]) {
        	$step{$count}{'char'} = $collision1[109];
        } elsif ($collisionhex !~ $collision1[110]) {
        	$step{$count}{'char'} = $collision1[110];
        } elsif ($collisionhex !~ $collision1[123]) {
        	$step{$count}{'char'} = $collision1[123];
        }
        if (!$step{$count}{'char'}) { 
          cleanup();
        	print "[!] Found no usefull check char in collision, restarting collision for step ".($count+1).".\n"; 
        	next; 
        } 
        print "[*] Found check char: \\x".($step{$count}{'char'})."\n";

        $step{$count}{0} = $collision[0];
        $step{$count}{1} = $collision[1];
        # Adding collision 1 to current working file
        $tmpchr = chr($count+1);
        $newchr = chr(hex($step{$count}{'char'}));
        $workfile =~ s/$pat\x00$tmpchr\x00\x00/$collision[0]\x00$newchr\x00\x00/;
        $count++;
      } else {
        print "[!] Found \\x00, restarting collision for step ".($count+1).".\n";
      }
      # Cleaning up temp files
      cleanup();
    } else { $stop = "1"; }
  }
  print "[*] Writing ".$amount." collision files\n";
  $binsize = sprintf("%b", $amount);
  for ($i="0";$i<$amount;$i++) {
    $tmpfile = $inputfile;
    $bincounter = sprintf("%.".length($binsize)."b", $i);
    @whichcollision = split(//,$bincounter);
    $count = 0;
    foreach $thiscollision (@whichcollision) {
    	$pat = pattern($count);
    	$tmpchr = chr($count+1);
    	$newchr = chr(hex($step{$count}{'char'}));
    	$tmpfile =~ s/$pat\x00$tmpchr\x00\x00/$step{$count}{$thiscollision}\x00$newchr\x00\x00/;
    	$count++;
    }
    open(OUTPUTFILE,">>".$ARGV[2].".".$bincounter);
    binmode OUTPUTFILE;
    print OUTPUTFILE $tmpfile;
    close(OUTPUTFILE);
  }
  print "[*] Done! Time needed: ".(time - $^T)." seconds\n\n";
  exit;
}

error(0);

# Random pattern create routine
sub random {
  $random = "";
  @random = ("A".."Z","a".."z","0".."9");
  for ($r=0;$r<184;$r++) {
    $random .= $random[int(rand(@random))];
  }
  return $random;
}  

# Pattern create routine
sub pattern {
  ($cnt) = @_;
  $pattern = "";
  $cnt2 = "0";
  @abc = ("AA".."ZZ");
  @one23 = ("00".."99");
  if ($cnt >= 100) { 
  	$cnt2 = int($cnt / 100);
    $cnt = $cnt - (100 * $cnt2);
  }
  for ($a=0;$a<32;$a++) {
    $pattern .= $abc[$cnt2].$one23[$cnt];
  }
	return $pattern;
}

sub cleanup {
  if (-e "MD5_collider_tmp_header") { `del MD5_collider_tmp_header`; }
  if (-e "MD5_collider_tmp_collision_1") { `del MD5_collider_tmp_collision_1`; }
  if (-e "MD5_collider_tmp_collision_2") { `del MD5_collider_tmp_collision_2`; }
}

# Error messages
sub error {
 ($ern,$errvar) = @_;
  if ($ern == "0") { print "[*] Usage: \n";
  	  if ($0 =~ /\\([^\\]*.exe)/) {
	    print "      $1 <option> <variables>\n\n";
    } else {
	    print "      perl $0 <option> <variables>\n\n";  	
    }
    print "[*] Options:\n".
          "      -c : Create new C template\n".
          "           -c <amount> <output filename>\n".
          "      -b : Build hashcollision files\n".
          "           -b <input filename> <output filename>\n".
          "[*] Examples:\n";
  	if ($0 =~ /\\([^\\]*.exe)/) {
	    print "      $1 -c 16 output_file.c\n".
	          "      $1 -b input_file.exe output_file.exe\n\n";
    } else {
	    print "      perl $0 -c 16 output_file.c\n".
	          "      perl $0 -b input_file.exe output_file.exe\n\n";
    }      
  }
  if ($ern == "1") { print "[!] Error: File ".$errvar." already exist!\n\n"; error(0); }
  if ($ern == "2") { print "[!] Error: File ".$errvar." does not exist!\n\n"; error(0); }
  if ($ern == "3") { print "[!] Error: Amount ".$errvar." isn't a number!\n\n"; }
  if ($ern == "4") { print "[!] Error: Can not open outputfile ".$errvar." \n\n"; }
  if ($ern == "5") { print "[!] Error: Temp file MD5_collider_tmp_header already exist, please remove.\n"; }
  if ($ern == "6") { print "[!] Error: Can not write to temp file MD5_collider_tmp_header.\n"; }
  if ($ern == "7") { print "[!] Error: Can not open file ".$errvar."\n\n"; error(0); }
  if ($ern == "8") { print "[!] Error: Could not identify amount of output files from source file\n\n"; }
  if ($ern == "10") { print "[!] Error: ".$fastcoll." doesn't work like it should.\n\n"; }
  cleanup();
  exit;	
}

ebCTF

In 2013, I created a challenge for ebCTF (BIN400) which involved creating the kind of executable files as named in this write-up. 14 teams managed to solve this question as can be seen on the scoreboard. Challenge description:

Please upload 5 Windows console executable files with the same MD5 but with different printed outputs (file type: MS Windows, PE32 executable, console)

The output for the files should be:
File1: All Eindbazen are wearing wooden shoes
File2: All Eindbazen live in a windmill
File3: All Eindbazen grow their own tulips
File4: All Eindbazen smoke weed all day
File5: All Eindbazen are cheap bastards

NOTE: The goal of this challenge is NOT to attack this webservice, this is NOT a WEB challenge.

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *