Endgame

IBM Home

Products

Consulting

Industries

News

About IBM

IBM : developerWorks : Security : Education - online courses

How to break a substitution cipher, part 2 page 2 of 5

Cryptanalysis of the Caesar cipher is not nearly as hard as breaking any modern cipher, but many of the same principles apply to both. Let us do some simple statistics. It turns out that the letters of English (or Latin) occur with quite different frequency from each other. For instance, this tutorial has a lot more "E's" in it than it does "Q's". Encrypting a message with a Caesar cipher does not change the statistical distribution of letters in a message, it just makes different letters occupy the same frequencies. That is, if a particular Caesar cipher key transposes E's to Q's, you'll find the encrypted version of this tutorial has exactly as many Q's as the original did E's.

Fair enough, but how does an attacker know how many E's were in the original message without knowing the message? He does not need to know this information exactly; it is enough to know that E's make up a whopping 13% of normal English prose (not including punctuation and spaces; just letters). Any letter that occurs in 13% of the cipher text is extremely likely to represent an E. Similarly, the most common remaining letters in the cipher text probably represent "T's" and "N's". This is the low entropy (rate-of-language) of English coming back to haunt us. All you need to do is use up all the letters, make sure the message looks like a message, and you are done!

Privacy		Legal		Contact