Frequency Analysis is a cryptanalysis technique of studying the
frequency that letters occur in the encrypted ciphertext. In English,
certain letters are more commonly used than others. This fact can be
used to take educated guesses at deciphering a Monoalphabetic Substitution Cipher.
Monoalphabetic Ciphers
A monoalphabetic cipher uses the same substitution across the entire
message. For example, if you know that the letter A is enciphered as
the letter K, this will hold true for the entire message. These types
of messages can be cracked by using frequency analysis, educated guesses and trial and error.
- Caesar Cipher
- Atbash Cipher
- Keyword Cipher
- Pigpen / Masonic Cipher
- Polybius Square
Here is the alphabet in order of the frequency that each letter is used.
E, T, A, O, I, N, S, R, H, L, D, C, U,
M, F, P, G, W, Y, B, V, K, X, J, Q, Z
Frequency analysis
Encrypted text is sometimes achieved by replacing one letter by another. To
start deciphering the encryption it is useful to get a frequency count of all the
letters. The most frequent letter may represent the most common letter in English
E followed by T, A, O and I whereas the least frequent are Q and Z. Common percentages in standard English are:
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
8.2 |
1.5 |
2.8 |
4.3 |
12.7 |
2.2 |
2.0 |
6.1 |
7.0 |
0.2 |
0.8 |
4.0 |
2.4 |
n |
o |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
6.7 |
7.5 |
1.9 |
0.1 |
6.0 |
6.3 |
9.1 |
2.8 |
1.0 |
2.4 |
0.2 |
2.0 |
0.1 |
and ranked in order:
e |
t |
a |
o |
i |
n |
s |
h |
r |
d |
l |
u |
c |
12.7 |
9.1 |
8.2 |
7.5 |
7.0 |
6.7 |
6.3 |
6.1 |
6.0 |
4.3 |
4.0 |
2.8 |
2.8 |
m |
w |
f |
y |
g |
p |
b |
v |
k |
x |
j |
q |
z |
2.4 |
2.4 |
2.2 |
2.0 |
2.0 |
1.9 |
1.5 |
1.0 |
0.8 |
0.2 |
0.2 |
0.1 |
0.1 |
Common pairs are consonants TH and vowels EA. Others are OF, TO, IN, IT,
IS, BE, AS, AT, SO, WE, HE, BY, OR, ON, DO, IF, ME, MY, UP. Common pairs of
repeated
letters are SS, EE, TT, FF, LL, MM and OO. Common triplets
of text are THE, EST, FOR, AND, HIS, ENT or
THA.
If the results show that E followed by T are the most common letters then
the ciphertext may be a transposition cipher rather than a substitution. If
one of the characters has a 20% then the language may be German since it has
a very high percentage of E. Italian has 3 letters with a frequency greater
than 10% and 9 characters are less than 1%.
http://www.braingle.com/brainteasers/codes/frequencyanalysis.php, http://www.richkni.co.uk/php/crypta/freq.php, http://cryptoclub.math.uic.edu/substitutioncipher/frequency_txt.htm
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.