Tuesday, October 15, 2013

Codes and Ciphers : Frequency Analysis

Frequency Analysis is a cryptanalysis technique of studying the frequency that letters occur in the encrypted ciphertext. In English, certain letters are more commonly used than others. This fact can be used to take educated guesses at deciphering a Monoalphabetic Substitution Cipher.

 Monoalphabetic Ciphers

A monoalphabetic cipher uses the same substitution across the entire message. For example, if you know that the letter A is enciphered as the letter K, this will hold true for the entire message. These types of messages can be cracked by using frequency analysis, educated guesses and trial and error.
  • Caesar Cipher
  • Atbash Cipher
  • Keyword Cipher
  • Pigpen / Masonic Cipher
  • Polybius Square
Here is the alphabet in order of the frequency that each letter is used.
E, T, A, O, I, N, S, R, H, L, D, C, U, 
M, F, P, G, W, Y, B, V, K, X, J, Q, Z

Frequency analysis

Encrypted text is sometimes achieved by replacing one letter by another. To start deciphering the encryption it is useful to get a frequency count of all the letters. The most frequent letter may represent the most common letter in English E followed by T, A, O and I whereas the least frequent are Q and Z. Common percentages in standard English are:

a b c d e f g h i j k l m
8.2 1.5 2.8 4.3 12.7 2.2 2.0 6.1 7.0 0.2 0.8 4.0 2.4
n o p q r s t u v w x y z
6.7 7.5 1.9 0.1 6.0 6.3 9.1 2.8 1.0 2.4 0.2 2.0 0.1
and ranked in order:

e t a o i n s h r d l u c
12.7 9.1 8.2 7.5 7.0 6.7 6.3 6.1 6.0 4.3 4.0 2.8 2.8
m w f y g p b v k x j q z
2.4 2.4 2.2 2.0 2.0 1.9 1.5 1.0 0.8 0.2 0.2 0.1 0.1
Common pairs are consonants TH and vowels EA. Others are OF, TO, IN, IT, IS, BE, AS, AT, SO, WE, HE, BY, OR, ON, DO, IF, ME, MY, UP. Common pairs of repeated letters are SS, EE, TT, FF, LL, MM and OO. Common triplets of text are THE, EST, FOR, AND, HIS, ENT or THA.
If the results show that E followed by T are the most common letters then the ciphertext may be a transposition cipher rather than a substitution. If one of the characters has a 20% then the language may be German since it has a very high percentage of E. Italian has 3 letters with a frequency greater than 10% and 9 characters are less than 1%.
http://www.braingle.com/brainteasers/codes/frequencyanalysis.php, http://www.richkni.co.uk/php/crypta/freq.php, http://cryptoclub.math.uic.edu/substitutioncipher/frequency_txt.htm


