This is the second part of a series covering cryptography algorithms. If by any chance you have missed its first part, I urge you to check it out right now. It is called "An Introduction to Cryptography." In order to understand this article, it is crucial to grasp the concepts explained in that part.
Before we move on, I'd like to remind you again that the XOR is an algorithm that is same for both encryption and decryption (symmetric). As a result of this, the most crucial factor is if the attacker somehow finds out that the applied encryption algorithm was XOR. If the attacker succeeds to "guess" this then he is one step closer to breaking the encryption.
Let's assume for a moment that the attacker finds out that the ciphertext was XORed. He does not know the password but that does not matter, since he wants to get the plaintext not the password per se. The very fact that XOR was used and not any other algorithm is satisfying for him because he knows the mathematical formula:
plaintext ^ key = ciphertext
Because XOR is symmetric the following activity can be done: if he is able to find another file that was XORed with the same password as the previous one then he is able to find out the key that was used in both using the formula below:
plaintext ^ ciphertext = key
The bottom line is that XOR is not secure due to its symmetry. If multiple files were encrypted with the same password then the attacker will be able to access all of those files because he has found out the original key.
Therefore, the security lesson of the day for XOR is that we should not let a user access both the plaintext and ciphertext versions. If we do, it will take him only a few milliseconds to figure out our password too. Of course, we should also change our password frequently. This is a great security tip too.
Now let's try a different approach - "brute-forcing." We know that the XOR is a banal algorithm and due to this it is insanely fast (and simple). As a result, in a very short amount of time we should be able to brute-force all of the possible key combinations while, comparing the currently tried combination XORed version with the plaintext. If we get a match then we've "guessed" the key.
Another security lesson is that we should always prefer complex and long passwords. That includes digits, letters and special characters too. Having long and complex keys drastically reduces the efficiency of brute-forcing and, thus, the situation gets very hard and would require a lot of time and resources. The attacker will (hopefully) quit.
The last technique we will cover here assumes that somehow the attacker finds out the language in which the document was written. Due to this, this method applies to classic text files. Also, a language that does not use accents, like English, is a benefit from the attacker's point of view.
The attacker will proceed using the "counting coincidences" technique to find out the length of the key/password. This is possible by placing randomly 1-1 bytes from the ciphertext while comparing to the ciphertext too and counting the equal number of bytes. If the full plaintext was encrypted by one single password then more than six percent of the bytes remain equal- using the aforementioned technique.
If the plaintext was encrypted by more than one password then approximately lower than 0.4 percent of the bytes remain equal. This is irrelevant to our XOR code because we used only one key. But it is worthwhile to keep in mind for those who might get into this.
All in all, the lowest number of randomly positioned bytes which lead to the exact same key, represents the length of the original key.
Now that we've successfully discovered the exact length of the key, we are going to XOR the ciphertext with this length. This results in the original plaintext being XORed with itself. Think of it as "plaintext ^ plaintext." This is an amazing accomplishment because we have completely evaded the key. Of course, the resulting file still looks like it is encrypted but this is just an illusion. The fact is that it is XORed with itself only!
Let's continue our assumption that this text file is written in English. Therefore, we know which words are the most common in this language. We also know that each byte represents a character that stands for a letter or punctuation mark in the text. So it has a meaning. Because in every text different parts and words appear multiple times, we can use an algorithm that applies XOR until we get a "meaningful text-file." This stands for a text file that does not contain gibberish.
I realize that this procedure looks very complex and hard but it is not. In fact, it is plain simple. This algorithm will be a sort of brute-forcing but it will be more efficient because we know a lot of the attributes of the text file. We know that it is a text file, we know that we need to get a meaningful output and we also know the coincidences and statistics of the most common words, letters, parts, etc. All in all, nowadays for a powerful computer this wouldn't take more than a few minutes or hours (depending on the length and numerous other factors).