Separators in Ciphertext

Encryption Keys

Ciphertext Language Identification

Padding Ciphertext

Punctuation in Ciphertext

If solving a cryptogram depends upon being able to distinguish between

the start and end of each word then separators must exist. There are

four general cases (though two in essence):

1) separators exist and are in the clear:

i.e., zyxwvuts zyxwvr tq vqypwa xoxy zwyxz pq zyxwv nxmqyx

2) separators exist but are encrypted:

i.e., zyxwvutsczyxwvrctqcvqypwacxoxyczwyxzcpqczyxwvcnxmqyx

3) separators exist but are artificial ['c' is padded at end]:

i.e., zyxw vuts zyxw vrtq vqyp waxo xyzwy xzpq zyxw vnxm qyxc

4) separators do not exist:

i.e., zyxwvutszyxwvrtqvqypwaxoxyzwyxzpqzyxwvnxmqyx

The first two are equivalent except that the second requires the

cryptanalyst to discover which cipher symbols are encrypted separators.

The second two are equivalent in that the existence of artifical separators

are as useless as the non-existence of separators since words are not

distinguishable (which is important for determining word lengths and

word patterns in some cases).

Visually, the first and the third appear to have spaces but the latter

is easily recognizable as blocked cipher text. However, the second and

the fourth are nearly indistinguisable. Before taking action, the

cryptanalyst must decide which is the case. I could just recursively

parse and see if any results occur (parsing through the EverCrack Kernel) or I

could perform a frequency count.

If the spaces are simply encrypted (mono-alphabetically) the space

character should appear as the most frequent character. At this point

I could replace those characters with a space and run it through the

kernel or I could (if the text is long enough) perform a frequency

analysis of the length of the words - calculate the average distance

for that cipher symbol. The average distance between spaces in text

is approximately 5.0 (which implies by default that the average length

of words is 4).

However, there are two cases in which this average must be analyzed

further (breaking down the individual numbers):

1) when the words are in blocks (e.g. blocks of four (viewable)

2) when the cipher text has been transposed (in which the variance

may be great but will still reflect the standard average distance).

This can be determined by looking at theindividual distances and

calculating the average variance. If the average distance is not

correct then it is most likely that either there are no separators

used or the cipher text has undergone transposition. If the latter

is the case, then the cryptanalyst must submit the ciphertext to

different transpositions until the separators match the average

distance variance (if not, then it is *certainly* the case that

no separators are used).

Now, aside from those cases, if the frequency is (without variance)

the most frequent cipher-symbol and the average distance is approx-

imately 5.0, it can be statistically assumed that that cipher-symbol

is indeed a separator and replaced accordingly. This one discovery

has now provided more information to the cryptanalyst; the length

of all words and the linguistic pattern of each of those words in

the cipher-text [the pattern is only discernable if it is a mono-

alphabetic cipher].

If no spaces are present in the ciphertext the next step depends

upon whether the ciphertext (if known) is a substitution or a

transposition cipher. If it is a substitution cipher the ciphertext

must be iteratively parsed with each parsed-pattern sequence (of 'words')

cryptanalyzed. If it is a transposition cipher the cipher-text is simply

parsed after various rounds of transposition and tested against dictionary

words (using either specific or general implementation cipher crackers).

Keys represent a particular implementation of an enciphering method.

In substitution systems, they key means two different things for mono-

alphabetic and polyalphabetic systems. Generally, a monoalphabetic

system is not considered to use a key, but essentially, it does.

Consider the Caesar cipher, in its general class as a displacement

cipher. The cipher used a standard, direct cipher alphabet [the

letters were in alphabetic order in the standard left-to-right

direction]. Generalizing away from the specific implementation

of rotating the alphabet 13 positions [in which case the key was

the method or algorithm itself - which was secret], although the

cryptanalyzer may know the method, the cipher alphabet could have

been shifted any of 25 positions. The key in this case, is the

specific number of shifts employed to encipher the message.

Moving to monoalphabet ciphers, and generalizing to any of the

(26! - 1) alphabets that could have been used to encipher the

message, the key is really the specific cipher alphabet used

[as in the Caesar shift, the number of shifts represented the

specific cipher alphabet used]. In the polyalphabetic system,

the key [its length] represents the number of alphabets used

[that number out of a pool of (26! - 1) alphabets.

Keys have two essential properties: length and symbol space.

Both effect the strength of the key [how difficult it would be

to cryptanalyze the key] and the strength of the encryption [how

difficult it would be to cryptanalyze the message].

The length of the key represents how many characters can be used

in the key to encrypt the message [and therefore how many alphabets

will be used]. Generally, the longer the key the better [ideally,

a key of equal length to the message being encrypted would be very

secure since the length of the key would be of less use in cracking

the encrypted message].

The symbol space of the key represents how many types of

characters can be used in the key to encrypt the message [e.g., only

letters, letters and numbers, or any ASCII character, etc.,].

The more tokens available for the key make it only marginally

tougher to cryptanalyze than the length of the key. Increasing

the length of the key increases the key space exponentially

whereas increasing the symbol space of the key increases the

key space linearly. If a key consists of five symbols [only

lower-case letters of the alphabet] the key space equals

11,881,376 possibilities. Now by doubling the symbol space

to include upper-case letters the key space now equals 380,204,032,

which is significantly greater. However, by retaining the initial

symbol space and doubling the key length the key space

equals a whopping 141,167,095,653,376 possibilities! Key strength

can be calculated as follows: [symbol space] ^ [key length].

By doubling the symbol space the key space only increased by a

few tenfolds whereas by doubling the key length the key space

increased by a few thousand-folds

As far as strengthening the encryption, the key length [number of

alphabets used] increasingly hides the underlying linguistic

structure [frequency distribution of the letters].

Since all languages have structural properties particular to them,

these properties can be utilised to identify the probable language

of a cipher text.

For instance, each language has its own Index of Coincidence, which

can be compared against the cipher text to see which languages' IC

most closely matches that of the cipher text. This method can be

used for both substitution and transposition ciphers.

EverCrack implements this method to guess the language behind

the ciphertext [ EverCrack Language Identification Tool. It first

calculates the IC then compares that value against the IC values of

known languages. The IC value which it varies the least from is the

language it chooses as the best guess. At this point you have a

better idea of which dictionary set to use to crack the cipher.

For transposition ciphers, since the identity of the letters do not

change, the frequency distribution of the cipher text can be matched

against the frequency distribution of a particular language. This is

primarily only useful for discriminating against a pool of languages

that share similar alphabetic symbols (i.e., English and German).

Padding is primarily applied to Transposition ciphers for completing

a geometrical enciphering figure [e.g., in columnar ciphers]. The

padding also serves the dual purpose of hiding the frequency

distribution from a cryptanalyzer. Transposition systems that

do not include substitution leave the frequency distribution of

the letters in the clear.

Padding can be very strategic in this manner. Padding with

letters that normally are not frequent can flatten the frequency

distribution [IOC] which can make it more difficult for the

cryptanalyzer to; identify the type of cipher and identify the

language of the message.

Padding the message with vowels can disrupt the cryptanalyzers

ability to perform vowel-analysis on particular rows-columns of a

generated matrix [this method can inform the cryptanalyzer to the

particular geometrical figure used to transpose the message].

EverCrack has only one method for dealing with pads in trans-

position ciphers [albeit, relatively weak]. Using the Matrix Cracker

generates all possible geometric figures and sends each

result to a parser which traverses through the resulting text

trying to build words that match against the dictionary.

Punctuation in cipher-text can present problems similar to

separators - for punctuation acts somewhat like word delineation.

I would have to say that the first indication that punctuation is

being used [and encrypted] would be if the total number of cipher

text symbols exceeded the number of letters in the target

languages alphabet [of course for short messages where

not all letters were used this is more of a problem].

Fortunately, like letters, punctuation marks do have signature

frequencies and relative distances that they occur in clear text.

Using statistical analysis, one may be able to figure out which

cipher symbols are punctuation marks [and probably simply

remove them from the text].

[Return to Evercrack Main]