EverCrack Open Source Cryptanalysis

Monoalphabetic substitution ciphers utilize one cipher alphabet

to encrypt the letters of the clear text message. Again, the cipher alphabet

can consist of any type of symbol or set of symbols as long a specific cleartext

unit will always encode to the same cipher text unit and each cipher text unit

will always decode to the same clear text unit. I say unit, as an abstraction,

because monoalphabetic substitution ciphers can be classified as one of two

methods of encryption: uniliteral ciphers and multiliteral ciphers. This will

be explained in depth later but for now just assume that a unit means a single

symbol [clear or cipher ].

Cipher alphabets generally come in three primary sequences:

1. Standard Sequence

2. Systematically Mixed Sequences

3. Random Sequences

A standard sequence consists of the cipher letters being in A-Z

order or Z-A order in reverse standard sequences]. Both are equally "relatively"

secure - by relatively secure I mean that if the cryptanalyst knows the type

of cipher and is attacking the key then neither is more secure than the other

because both only have 25 possible keys [25 possible shifts or rotations, as in

the Caesar cipher, which makes these a special sub-class of a monoalphabetic

substitution cipher known as a displacement ciphers].

For example, take the message: THIS IS A SECRET MESSAGE.

Clear Alphabet

Cipher Alphabet

If each letter from the clear text message [in the Plain Alphabet].

were encoded by using the letter that directly maps beneath it [in the Cipher

Alphabet] the encrypted message would be: GSRH RH Z HVXIVG NVHHZTV.

This particular type of cipher alphabet is a "reverse-standard"

cipher alphabet because the alphabet has been reversed but it is still in standard

order [the form follows from A - Z].

A systematically mixed sequence consists of a key word to "salt" the

start of the clear text alphabet, then, omitting repetitive letters, the rest of

the alphabet is written out. If the keyword is E V E R C R A C K it is reduced to

E V R C A K and the below clear text to clear text alphabet results. Again, the

message: THIS IS A SECRET MESSAGE

Clear Alphabet

E

V

R

C

A

K

B

D

F

G

H

I

J

L

M

N

O

P

Q

S

T

U

W

X

Y

Z

Cipher Alphabet

Z

Y

X

W

V

U

T

S

R

Q

P

O

N

M

L

K

J

I

H

G

F

E

D

C

B

A

encrypts to: FPOG OG V GZWXZF LZGGVQZ.

The encrypted result looks different from the previous encryption

but is really no different - a single cleartext unit maps to a single ciphertext

unit. If the cryptanalyst attacks the key, the solution space is much greater

than the previous method. If the cryptanalyst is using a generalized solution

[that applies to all uniliteral, monoalphabetic substitution ciphers], like

EverCrack, then the methods are equally "difficult" - a generalized solution

covers all (26! - 1) possible random cipher alphabets [which includes all of

the above].

A random sequence is simply random. This makes the entire solution

space [possible alphabets used] (26! - 1). This is a fairly high number and even

a brute force attack on modern computers would still take a while. A brute force

attack would equate to a permutation attack against the alphabet itself. On my

500 MHz laptop a permutation of 11 letters [39, 916, 800 permutations] took

approximately 25 minutes. For a permutation of 26 letters [403, 291, 461, 126

605, 635, 584, 000 ,000 permutations] is roughly 10 pentillion times greater

than the permutation of 11 letters it would be suffice to say that it would

take 480, 560, 378, 380, 273 years to solve to perform a permutation attack

on the cipher alphabet.

The primary method for solving substitution ciphers is frequency

analysis. Frequency analysis relies on the fact that specific letters in

specific languages occur in fairly regular frequencies or percentage of a text.

This is because languages tend to be redundant [and some phonemes and letter

combinations easier to verbalize are used more frequently]. By redundant I

mean that most words [especially longer words] tend to have repetitive letters

[e.g., "repetitive" has three letters, 'e', 't', and 'i', that occur more than

once].

Standard Letter Frequency Distribution from Highest to Lowest [English]:

Letter

Percentage

Letter

Percentage

E

12.77

C 2.96

T

8.55

M 2.88

O

8.07

P 2.23

A

7.78

Y 1.96

N

6.86

W 1.76

I

6.67

G 1.74

R

6.51

B 1.41

S

6.22

V 1.12

H

5.95

K .74

D

4.02

J .51

L

3.72

X .27

U

3.08

Z .17

F

2.97

Q .8

This distribution of frequencies tends to vary as the size of text

decreases - it appears most constant towards 1000 letters or more. Using these

properties of the underlying clear message, in a monoalphabetic substitution

cipher, the cipher text letters will tend to exhibit the same frequency

distribution so most letters can be easily guessed.

EverCrack does not use frequency analysis. It uses a comparison and reduction

approach based on the patterns of the cipher words. The full description of how

it works EverCrack Kernel.

and the online tool is EverCrack Tool

A multiliteral cipher consists of replacing each clear symbol

with two or more cipher symbols. If two cipher symbols are used to rep-

resent a single clear symbol, then using the following plain and cipher alphabets:

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

ZY

XW

VU

TS

RQ

PO

NM

LK

JI

HG

FE

DC

BA

AB

CD

EF

GH

IJ

KL

MN

OP

QR

ST

UV

WX

YZ

clear text message: T H I S I S A S E C R E T M E S S A G E

cipher text message: MNLKJIKL JIKL ZY KLRQVUIJRQMN BARQKLKLZYNMRQ

On initial inspection, the cipher appears more complicated but

if you know that it is a multiliteral cipher it is as trivial to crack as a

uniliteral cipher - just a little extra manual work. It should be obvious that

it is not a uiliteral cipher because it would then be an anomolous message

consisting of five words, three of which are quite lengthy [words of specific

lengths also tend to have specific frequency distributions].

The purpose of multiliteral ciphers is to hide the frequency

distribution of the underlying cipher text. In the previous section, it was

discovered that every language has a characteristic pattern to the distribution

of the letters that comprise that language. That property of language is a ref-

erence that can be used to perform cryptanalysis. In this case, to make the

cipher more difficult to solve, that property of the language has been trivially

disguised.

Assuming it is a multiliteral cipher, the EverCrack Multiliteral-

to-Uniliteral Cipher Convertor can simply replace the multiliteral polygraphs

with uniliteral monographs [the conversion is arbitrary] to then feed into the

EverCrack Monoalphabetic Substutition Cipher Cracker to crack the message.

[Specific Multiliteral Monoalphabetic Substitution Ciphers]

Polygraphic ciphers are created by encrypting more than one

plain symbol at a time [in this case, the number of cipher symbols the clear

symbols encrypt to does matter]. Although, since we are still dealing with

monoalphabetic substitution ciphers at this point, the correspondence between

clear text units and cipher text units is still one-to-one, this type of

cipher is considerably more difficult to crack than a monographic cipher. Out of

the 26 letters, a pool of 676 digraphs can be constructed [although many di-

graphs, BX, never actually appear in clear text]. Thus, there must be a

few hundred cipher digraphs to account for the clear digraphs [that are en-

crypted] depending on the size of the message.

Following out sample clear text message: THIS IS A SECRET MESSAGE

we have approximately 9 unique digraphs to encrypt [if the message does not

contain an equal number of letters a "cipher pad" can be used]:

Clear Digraphs

TH

IS

AS

EC

RE

TM

ES

SA

GE

Cipher
Digraphs

AB

CD

EF

GH

IJ

KL

MN

OP

QR

with a resulting cipher text of: AB CD CD EF GH IJ KL MN OP QR

In the multiliteral cipher there was a correspondence between the number of clear symbols and the number of cipher units - in this case, less

cipher units result using polygraphic enciphering. This not only hides the

frequency distribution of the text but also the number of underlying symbols per

se! This means that a different form of frequency analysis must be performed:

digraphic frequency analysis [or polygraphic in general]

[Specific Polygraphic Monoalphabetic Substitution Ciphers]

[Return to Evercrack Main]

Letter	Percentage	Letter	Percentage
E	12.77	C	2.96
T	8.55	M	2.88
O	8.07	P	2.23
A	7.78	Y	1.96
N	6.86	W	1.76
I	6.67	G	1.74
R	6.51	B	1.41
S	6.22	V	1.12
H	5.95	K	.74
D	4.02	J	.51
L	3.72	X	.27
U	3.08	Z	.17
F	2.97	Q	.8