Letter Frequency Analyzer
Analyze the frequency of letters in text. Compare distributions to English language patterns to help break substitution ciphers and identify encrypted text.
What is Letter Frequency Analysis?
Letter frequency analysis is a technique for studying how often each letter appears in text. In English, letters don't occur equally—E is the most common (about 12.7% of all letters), while Z is rare (about 0.07%). By analyzing these patterns, cryptanalysts can break substitution ciphers.
This tool counts letter occurrences in your text and compares them to expected English language frequencies, helping identify encrypted text and potential cipher substitutions.
How Frequency Analysis Works
The technique relies on a key principle: encryption preserves letter frequency patterns. If plaintext E (most common) encrypts to X, then X will be the most common letter in the ciphertext. By matching frequencies, you can deduce substitutions.
Analysis Steps
- Count occurrences of each letter in the ciphertext
- Calculate the percentage frequency of each letter
- Compare to expected English frequencies
- Make educated guesses about letter substitutions
- Test hypotheses and refine
English Letter Frequencies
Standard English letter frequencies (approximate percentages):
- E - 12.7% (most common)
- T - 9.1%
- A - 8.2%
- O - 7.5%
- I - 7.0%
- N - 6.7%
- S - 6.3%
- H - 6.1%
- R - 6.0%
The mnemonic "ETAOIN SHRDLU" represents the most common letters in order, famously appearing on Linotype machines.
Frequency Analysis in Geocaching
Puzzle caches often use substitution ciphers that frequency analysis can help break:
Identifying Cipher Types
- Simple substitution: Frequencies match English but letters are different
- Caesar cipher: Frequencies shift together (all offset by same amount)
- Random text: Approximately equal frequencies (not likely a substitution cipher)
- Polyalphabetic: Flattened frequencies (like Vigenere)
Breaking Substitution Ciphers
To crack a simple substitution cipher:
- Find the most common letter in ciphertext (likely E)
- Look for common two-letter patterns (TH, HE, AN, IN)
- Identify common three-letter words (THE, AND, FOR)
- Use context to refine guesses
- Build the full substitution alphabet
Common Letter Patterns
Beyond single letters, these patterns help crack ciphers:
Most Common Bigrams (Two-Letter)
- TH, HE, IN, ER, AN, RE, ON, AT, EN, ND
Most Common Trigrams (Three-Letter)
- THE, AND, ING, HER, ERE, ENT, THA, NTH
Common Short Words
- One letter: A, I
- Two letters: OF, TO, IN, IT, IS, BE, AS, AT, SO
- Three letters: THE, AND, FOR, ARE, BUT, NOT
Why ETAOIN?
The sequence ETAOIN SHRDLU comes from Linotype typesetting machines, where the first two columns of keys were arranged by letter frequency. Operators would sometimes run their fingers down these columns to mark errors, resulting in this sequence appearing in printed text.
Limitations of Frequency Analysis
Frequency analysis works best with:
- Longer texts: Short texts may have unusual distributions
- Standard English: Technical jargon or names may skew results
- Simple ciphers: Polyalphabetic ciphers flatten frequencies
For very short geocaching puzzles, frequency analysis provides hints but may require additional techniques.
Tips for Geocaching Puzzles
- Start with the most common: The highest-frequency letter is probably E or T
- Look for single-letter words: In English, these are usually A or I
- Find THE: Look for three-letter patterns where the first and third letters match common THE frequencies
- Check for coordinates: Look for N, S, E, W, and numbers that might indicate hidden coordinates
Related Cryptanalysis Tools
Use these tools alongside frequency analysis:
- Caesar Cipher: Test systematic shifts with brute force
- Vigenere Cipher: For polyalphabetic ciphers
- Character Counter: Get word and character statistics
- ROT13: Quick test for the most common rotation