EverCrack Open Source Cryptanalysis Engine

Overview of EverCrack

The EverCrack kernel cryptanalyzes uniliteral, monoalphabetic substitution ciphers.

Substitution ciphers involve taking a clear text "this is a message" and substitut-

the identity of each clear text letter with a cipher symbol [while retaining the

position of the letter] "zyxw xw vutwwvst". Monoalphabetic ciphers involve only

using one alphabet [out of (26! - 1) possible alphabets] to encipher the message.

That is, if 't' enciphers to 'z' in the cipher text, no other clear letter will be

represented by 'z' and 't' will always encipher to 'z'.

Uniliteral ciphers involve using only one cipher symbol to represent a single clear

symbol ['t' becomes 'z' rather than 'zg' in multiliteral ciphers [see Uniliteral Ciphers].

The form of cipher that EverCrack cracks is a weak cipher, it just utilizes a very

efficient method. Such a cipher still could be encrypted using one of (26! -1)

possible alphabets: 403, 291, 461, 126, 605, 635, 584, 000 ,000 possible alphabets.

Even with a modern computer, brute-forcing [trying every possible alphabet] would

take years and years to achieve. EverCrack does it in miliseconds because of its

kernel design. Since monoalphabetic ciphers do not use keys, the encryption itself

is the attack target [the possible alphabets] - this effectively makes the pool of

possible alphabets the key.

Return to Top

Details of Kernel

The EverCrack kernel uses a boolean-algebraic algorithm to perform its cryptanalytic

attack. By boolean-algebraic, I mean it performs a process of comparison and re-

duction to generate all possible internally consistent, valid decodes. The decodes

may have no semantic value, but they represent the smallest pool [based on the dic-

tionary set] of decodes in which each cipher symbol represents a clear-text symbol

[and this sometimes includes gibberish].

The first step involves gathering information about the cipher text:

-the number of cipher words

-the length of the cipher words

-the relative redundancy of the cipher words

Using this information, EverCrack resequences the cipher words from the longest to

shortest [see Optimizations].

Then, EverCrack catalogs which cipher words contain the most unique symbols

and uses these cipher words during the cryptanalytic attack [see Optimizations].

This optimization loop can run several times because the next step may render one or

more of this pool of cipher words as unusable [flagging it as SKIP].

The next step involves determining the list of clear words to use for decoding each

cipher word. If the cipher word is "zxyyzw", it uses the list [in the big dictionary]

"6P2.TXT". The '6' represents the number of letters in the cipher word. The 'P'

represents the fact that the cipher word has a pattern and the '2' represents the

pattern [2 repetitive letters]. As these lists are determined EverCrack checks to

see if such a list exists, if not, the cipher word is flagged as SKIP and the

optimization loop runs again.

Lets take a sample message: zxzyvyuvw vytsruquptoza upp vnsmzya

Firstly, the length and pattern [letter redundancy] of each word is stored and

EverCrack sorts the cipherwords by length: vytsruquptoza zxzyvyuvw vnsmzya upp

[the original order is stored in the int array seq[WORDS] as a global].

Then the appropriate plain word list is assigned to that word [i.e., "zxzyvyuvw"

is a word of 9 letters with 3 repetitions so the plain word list is 9P3.TXT].

Next, EverCrack chooses which cipher words to flag as DUP - this is to reduce the

redundancy in the cipher text. Starting with the longest word, each word is checked

to see if it possesses any unique letters [letters not present in other cipher words

- if a longer word does, it is the word kept]. Words flagged as DUP will use the

letter-decodes of cipher words flagged as USE [default - see PartialDecrypt()].

In this case, "upp" is flagged as DUP because 'u' and 'p' are present in "vytsruquptozs".

So now the message is: "vytsruquptozs zxzyvyuvw vnsmzys".

Word Word List Words in List

vytsruquptoza 13P3.TXT 124

zxzyvyuvw 9P3.TXT 797

vnsmzya 7N.TXT 2485

Given each assigned plain word list, EverCrack starts with the first cipher word

["vytsruquptoza"] and gets the first plain word from [its plain list - "abnormal-

ities"] and then grabs the next cipher word ["zxzyvyuvw"] and its first plain word

["abandoned"]. Then if the two cipher words have cipher letters in common [e.g.,

'v', 'z', 'y'] it checks to see if the two plain words letters are equal. If they

are not, [this case since in "vytsruquptozs" 'v' tranlates to 'a' whereas in

"zxzyvyuvw", 'v' translates to 'd'] the plain word for the second cipher word is

discarded and the next plain word from that list ["abatement"] is checked....

this goes on until eventually the second list is exhausted and we get the second

word from the first list ["accomplishers"] which is repeated until we get to

"cryptanalyzes" and "evercrack" [see below]...

vytsruquptozs	zxzyvyuvw	vnsmzys
abnormalities	abandoned	abdomen
accomplishers	abatement	abducts
accomplishing	abilities	abelson
.............	.........	.......
cryptanalyzes	evercrack	ciphers

Next, EverCrack will see if the plain words have letters in common and see if they

translate back to the same cipher text letters [this prevents rare but possible

false-positives]. If so, EverCrack keeps track of its spot in list two and now

opens list three until it gets to "ciphers" - when the consistency checks are

complete it outputs this valid chain of decodes [partially decrypting "upp"

using the appropriate letters from "cryptanalyzes"] - the decodes are properly

returned to their original sequence:

"evercrack cryptanalyzes all ciphers"

"evercrack cryptanalyzes all copiers"

In this case, 2 valid decodes were found. In this fortunate case, the number of

decodes is small [which is rare for short ciphers] and the semantically correct

one is easily discernible. In this case 'h' and 'i' [rather, their cipher symbol

counterparts] each appeared only once in the message, thus, with no other instances

to compare against, this non-semantic resulting decode was possible [although it is

an internally consistent, valid decode].

Return to Top

Optimizations

Resequencing by Length

Why ordered by length? Look up at the listing where the number of words per

list is noted. If EverCrack performed a comparison for each word you would have:

124 x 797 x 2485 = 245,587,580 comparisons to perform

and if we used the cipher word "upp":

124 x 797 x 2485 x 50 = 12,279,379,000 [which had 50 word list matches].

By ordering by length, I push the words with the smallest list to the right

[yet words with high redundancy] so there are more likely to be invalid

results at the *start* of the lists eliminating many potential comparisons

further down the lists.

Reducing Redundancy

By eliminating the cipher words which do not have unique cipher symbols

[preferring the longest of those that do] I effectively reduce the amount

of cipher text that must be cryptanalyzed to roughly 10% - which further

reduces the number of comparisons EverCrack must perform. This was probably

the milestone design optimization that made EverCrack fast. In fact, this

optimization made EverCrack quicker as the size of the input cipher text

increased [though leveling off at a certain amount]. On this particular

cipher, we have reduced the comparisons to: 2,487!

Return to Top

Possible Problems

No Decodes

EverCrack can fail to produce decodes for two basic reasons. First, the most

likely, because this is an online version and written in PHP there will be an

automatic timeout of 30 seconds. The original EverCrack, written in pure C,

can crack any cipher up to 4000 words in less than a second [rare instances

of taking longer for short phrases with low redundancy do exist]. PHP of

course, is slower than C, although the overall design of EverCrack keeps

it relatively fast. Second, the decisions EverCrack makes to choose which

cipher words to use [to crack and then decode the rest of the message] can

be difficult when there is a word in the message but not in the dictionary

set. This is a very rare case since EverCrack will exhaustively eliminate

words out of the pool of selected words until a decode is found. This is a

very complex issue dealing with how EverCrack flags words as USE, DUP, and

SKIP, to continually find the optimal pool of cipher words

Too Many Decodes

EverCrack can produce many, many decodes in cases where the message is short

and th words within that message are what I define as "flexible". By that, I

mean words that have patterns and levels of redundancy that can be inter-

changeable with many other words in the dictionary [usually the pattern is

negative with no redundancy: "weary"] and most words that fall under this

category are 4-8 letters long [which is the pool of words of highest frequency

by length in text]. This can in uncommon cases make finding the proper decode

annoying. EverCrack works best with longer messages - it is quicker and almost

always produces only one decode.

Particular Decode Not Found

EverCrack can fail to produce a particular decode because either only the

small dictionary set was used [right now it is coded to switch to the

larger if the small proves unsuccessful] or because [as in the first problem

NO DECODES] the target word is not in the dictionary set but a string of words

were found that did fit the particular pattern of words [that was the actual

clear message].

Nonsense Words

EverCrack can produce nonsense words because it uses larger, more redundant

words to decode the smaller, less redundant words. This was the last

cryptanalytic optimization I made that pushed the speed of EverCrack far

beyond previous measures. I think the sacrifice of occasional [consistent]

garbage was worth it.

However, another reason for "nonsense" words can be attributed to the dictionary

itself. Initial development began on Linux using the dictionary file. I have

found myself surprised to see some words in the dictionary that I have since

removed - but sometimes I still encounter more.

Erroneous Input

Erroneous input may cause EverCrack to not produce results:

- input with punctuation marks [any input that isn't an alphabetic character]

- blocked ciphers [ciphers without proper word divisions]

Most of the time EverCrack can strip non-alphabetic characters.� However,

if the character is a hyphen, EverCrack will concatenate the two parts to

form one word - this can result in a string of symbols that either has no

pattern match in the dictionary set or a word that matches that pattern but

is inconsistent with the rest of the cipher text.

Return to Top

What it Cracks

Types of Ciphers

EverCrack is a general-implementation, monoalphabetic substitution cipher

cracker. This means that it can crack any type of monoalphabetic substitution

cipher [out of the possible (26! - 1) cipher alphabets. This includes, but is

not limited to:

Atbash Ciphers

Caesar Ciphers

Affine Ciphers

The Caesar Cipher Cracker is a specific implementation cipher cracker - it

only cracks displacement ciphers [out of the broader class of monalphabetic

substitution ciphers]. EverCrack is a general-implementation cipher

cracker - it cracks all those ciphers [attacking the broader algorithmic

concept] which fall under the class of monoalphabetic substitution ciphers.

Format of Ciphers

- Word Divisions: EverCrack only crack ciphers with proper word divisions.

[see Word Divisions]

- Literality: EverCrack can only crack ciphers that are uniliteral

[that is, a single plain letter encrypts to a single cipher letter]. However,

the Multiliteral to Uniliteral Cipher Converter can be used to first

convert a multiliteral cipher into uniliteral format before using on EverCrack.

- Punctuation: EverCrack will strip all punctuation. If the punctuation

is encrypted, EverCrack will fail to produce the proper decodes. If punctuation

symbols were used as encrypting symbols EverCrack will fail to produce the proper

decodes. The application version can handle ciphers in which an Extended ASCII

symbol is used for encryption. This version is faster and available at SourceForge.

Word	Word List	Words in List
vytsruquptoza	13P3.TXT	124
zxzyvyuvw	9P3.TXT	797
vnsmzya	7N.TXT	2485