A Sequel to Cryptography

This is the second part of a series covering cryptography algorithms. If by any chance you have missed its first part, I urge you to check it out right now. It is called "An Introduction to Cryptography." In order to understand this article, it is crucial to grasp the concepts explained in that part.

I have promised that in this article I will continue showing you real-world encryption algorithms and more specifically that we are going to XOR. Throughout this part we will find out what exactly is XOR, how to implement it into an encryption algorithm, and ultimately a few techniques for breaking XOR-based encryptions. Does that sound provocative?

Let’s move on to the serious stuff. We have algorithms to understand and code, so there’s definitely no time to waste. Grab your coffee or energy drink, and let’s have fun!

All About XOR

Before we move on, allow me to give you a warning: Keep in mind that the encryption algorithm described in this article is solely for educational purposes. You should not rely on it for top-priority files and/or data. However, the author guarantees that the algorithms presented in this article will not do any harm to your computer- they do not contain any virus, Trojan, worm, spy-ware, mal-ware, or anything that’s malicious. Thanks for your time; now we can carry on.

The abbreviation XOR stands for eXclusive OR. It’s basically a logical operation; check out its truth-table below. It is true if and only if only one of the operands is true.

Operand p

Operand q

Result p≠q

0 – F

0 – F

0 – F

0 – F

1 – T

1 – T

1 – T

0 – F

1 – T

1 – T

1 – T

0 – F

So let’s memorize the definition of XOR: "either of the two, but not both."

"Why is this important to us?" you ask. It is because the XOR algorithm is based on this logical operation. Let me explain. In our algorithm we will proceed through the content of our file and apply the XOR algorithm to each and every byte. When we encrypt we do the following: XOR the plaintext (source file) with the given password. When we decrypt, we XOR the ciphertext (encrypted file) with the given password.

It is crucial to point out that the XOR process is the same for both encryption and decryption. Because we "flip" the bits or bytes on each step we can "deflip" them using the same scheme once again. There are a multitude of variations of XOR.

The following lines represent the heart of our XOR algorithm:

while ((s.read((char*)&c, sizeof(c))))

  {

       if (key[i]==”)

          i=0;

       c=key[i] ^ c;

       d.write((char*)&c, sizeof(c));

       i++;

  }

Variables: "s" is source; "d" is destination; "c" represents the character that we are going to XOR; "key" stands for the password key. Keep in mind that the key doesn’t equal the user-specified key. It goes through the following process to make it acceptable for our XORing algorithm:

for (int i=0; i<255; i++)

   key[i]=’§’;

scanf("%s",key); int temp=0;

for (i=0;i<255;i++)

{  if (key[i]==”) break;

   temp=i;

}

   strcpy(key_backup, key);

   for (i=temp; i>=0; i–)

       key[temp-i]=key[temp-i] & (~key_backup[i]);

Variables: "key" is the password specified by the user; "key_backup" is the backup of the original "key" string. This process is required to make the password ready to run.

Basically we fill the "key" string with "§" characters before loading the password. Due to this the password gets stored starting from the beginning of the key string and the rest of the 255 positions remain filled with that special character. After this, we apply the following formula to each of characters of the key:

key[temp-i]=key[temp-i] & (~key_backup[i])

Note: The "~" operator stands for the complement of the given value. To specify the complement of a value the computer does the following: it changes all of the 1s to 0s and all of the 0s to 1s from the stored binary address of the aforementioned value. For example: 01111101′s complement is 10000010. In the above formula "i" is the start expression of a "for" loop that starts from the length of the key and gets decremented by one until reaches its beginning (zero).

{mospagebreak title=Using XOR on an Example}

Let’s XOR our example text file with the password "longpassword123."

%’pl$=#-m:f49ne%!+#*"<#8(=w%#)#<8<s#>+pbml<(nz#8nf"’!zm@DF +u)!!s)?nP$(*-l+p8anw)>:/l9+p8cnp(+=g*>*ebm1y{5{uw

Check out the full C program:

#include <fstream>

#include <stdio.h>

#include <string.h>

using namespace std;

char srcfile[30], dstfile[30], key[255];

void main()

{

   printf("Source file: "); scanf("%s",srcfile);

   printf("nDestination file: "); scanf("%s",dstfile);

   printf("nPassword (up to 255 chars): ");

   //the start of the key preparation process

   for (int i=0; i<255; i++)

        key[i]=’§’;

   scanf("%s",key); int temp=0; char key_backup[255];

   for (i=0;i<255;i++)

   {

        if (key[i]==”)

           break;

        temp=i;

   }

   strcpy(key_backup, key);

   for (i=temp; i>=0; i–)

       key[temp-i]=key[temp-i] & (~key_backup[i]);

   //key preparation process is done; we’re ready to XOR!

   ifstream s(srcfile, ios::binary);

   ofstream d(dstfile, ios::binary);

   if (!s || !d)

       printf("nERROR: Couldn’t open one of the files!");

   i=0; char c;

   while ((s.read((char*)&c, sizeof(c))))

   {

       if (key[i]==”)

          i=0;

       c=key[i] ^ c;

       d.write((char*)&c, sizeof(c));

       i++;

   }

   s.close();  d.close();

   printf("nAll done. ");

}

Once again, the download button below leads to an archive that contains the ready-to-run executable and source code of the above algorithm. Enjoy!

{mospagebreak title=Breaking XOR Encryption}

Before we move on, I’d like to remind you again that the XOR is an algorithm that is same for both encryption and decryption (symmetric). As a result of this, the most crucial factor is if the attacker somehow finds out that the applied encryption algorithm was XOR. If the attacker succeeds to "guess" this then he is one step closer to breaking the encryption.

Let’s assume for a moment that the attacker finds out that the ciphertext was XORed. He does not know the password but that does not matter, since he wants to get the plaintext not the password per se. The very fact that XOR was used and not any other algorithm is satisfying for him because he knows the mathematical formula:

plaintext ^ key = ciphertext

Because XOR is symmetric the following activity can be done: if he is able to find another file that was XORed with the same password as the previous one then he is able to find out the key that was used in both using the formula below:

plaintext ^ ciphertext = key

The bottom line is that XOR is not secure due to its symmetry. If multiple files were encrypted with the same password then the attacker will be able to access all of those files because he has found out the original key.

Therefore, the security lesson of the day for XOR is that we should not let a user access both the plaintext and ciphertext versions. If we do, it will take him only a few milliseconds to figure out our password too. Of course, we should also change our password frequently. This is a great security tip too.

Now let’s try a different approach – "brute-forcing." We know that the XOR is a banal algorithm and due to this it is insanely fast (and simple). As a result, in a very short amount of time we should be able to brute-force all of the possible key combinations while, comparing the currently tried combination XORed version with the plaintext. If we get a match then we’ve "guessed" the key.

Another security lesson is that we should always prefer complex and long passwords. That includes digits, letters and special characters too. Having long and complex keys drastically reduces the efficiency of brute-forcing and, thus, the situation gets very hard and would require a lot of time and resources. The attacker will (hopefully) quit. 

The last technique we will cover here assumes that somehow the attacker finds out the language in which the document was written. Due to this, this method applies to classic text files. Also, a language that does not use accents, like English, is a benefit from the attacker’s point of view.

The attacker will proceed using the "counting coincidences" technique to find out the length of the key/password. This is possible by placing randomly 1-1 bytes from the ciphertext while comparing to the ciphertext too and counting the equal number of bytes. If the full plaintext was encrypted by one single password then more than six percent of the bytes remain equal- using the aforementioned technique.

If the plaintext was encrypted by more than one password then approximately lower than 0.4 percent of the bytes remain equal. This is irrelevant to our XOR code because we used only one key. But it is worthwhile to keep in mind for those who might get into this.

All in all, the lowest number of randomly positioned bytes which lead to the exact same key, represents the length of the original key.

Now that we’ve successfully discovered the exact length of the key, we are going to XOR the ciphertext with this length. This results in the original plaintext being XORed with itself. Think of it as "plaintext ^ plaintext." This is an amazing accomplishment because we have completely evaded the key. Of course, the resulting file still looks like it is encrypted but this is just an illusion. The fact is that it is XORed with itself only!

Let’s continue our assumption that this text file is written in English. Therefore, we know which words are the most common in this language. We also know that each byte represents a character that stands for a letter or punctuation mark in the text. So it has a meaning. Because in every text different parts and words appear multiple times, we can use an algorithm that applies XOR until we get a "meaningful text-file." This stands for a text file that does not contain gibberish.

I realize that this procedure looks very complex and hard but it is not. In fact, it is plain simple. This algorithm will be a sort of brute-forcing but it will be more efficient because we know a lot of the attributes of the text file. We know that it is a text file, we know that we need to get a meaningful output and we also know the coincidences and statistics of the most common words, letters, parts, etc. All in all, nowadays for a powerful computer this wouldn’t take more than a few minutes or hours (depending on the length and numerous other factors).

{mospagebreak title=Conclusions}

We’ve finished our XOR-related discussions. I’ve explained to you the background of the XOR logical operation, shown how to implement this to create an encryption algorithm, and ultimately we’ve endeavored in three unique ways to break it.

Assuming that you still remember the first part of this series ("An Introduction to Cryptography") right now you should have three encryption algorithms- all of them coded by you. You should have played with them by now already.

I have run out of space in this part, so I suggest that you fiddle with them to grasp the main concepts and coding workarounds until the next part arrives. That last part, like I promised, will give you a breakthrough of the latest top-notch encryption and security algorithms used nowadays. That is going to be the last part of this series – the epilogue. And it is going to be more conceptual. 

Since I am sure that I have gained your interest by now with these two parts, I will also give you a list of the best literature that’s available on the topic of cryptography in the epilogue. That should give you even more to look forward to. 

Until the epilogue arrives – keep on coding and practicing. See you at the last part!

Google+ Comments

Google+ Comments