December 2010 Update: I have changed the weightings to reduce the weight of programming text in an attempt to make the data more accurate for the non-programmer. I have also added another data set where programming text is removed entirely.
April 2009 Update: I have collected much more data, and now have improved letter frequency. How accurate is it, though? It should be accurate enough, but look here if you're really interested in accuracy.
Character Frequency: SPC e t a o i n s r h l d c u m f g p y w ENT b , . v k - " _ ' x ) ( ; 0 j 1 q = 2 : z / * ! ? $ 3 5 > { } 4 9 [ ] 8 6 7 \ + | & < % @ # ^ ` ~ Letter Frequency: e t a o i n s r h l d c u m f g p y w b v k x j q z Punctuation Frequency: , . - " _ ' ) ( ; = : / * ! ? $ > { } [ ] \ + | & < % @ # ^ ` ~ Number Frequency: 0 1 2 3 5 4 9 8 6 7 Big Key Frequency*: SPACE SHIFT BACKSPACE ENTER TAB *Approximate order. SHIFT is currently not testable, BACKSPACE depends on the user's error rate, and some text editors automatically indent.Digraph Frequency:
th he in er an re on at en nd st or te es is ha ou it to ed ti ng ar se al nt as le ve of me hi ea ne de co ro ll ri li ra io be el ch ic ce ta ma ur om ho et no ut si ca la il fo us pe ot ec lo di ns ge ly ac wi wh tr ee so un rs wa ow id ad ai ss pr ct we mo ol em nc rt sh po ie ul im ts am ir yo fi os pa ni ld sa ay ke mi na oo su do ig ev gh bl if tu av pl wo ry bu
Trigraph Frequency:the ing and ion ent hat her tio tha for ter ere his you thi ate ver all ati ith rea con wit are ers int nce sta not eve res ist ted ons ess ave ear out ill was our men pro com est ome one ect ive tin hin hav ght but igh ore ain str oul per sti ine uld ste tur man oth oun rom ble nte ove ind han hou whi fro use der ame ide ort und rin cti ant hen end tho art red lin
Word Frequency:the of to and a in that is i it for as with you on was be he this not have are at if but by from his or they an which we all said one had will my s so has their more there no what were when would your her can been she out who some do about me up new x him other them time than t into like only now its then may any how could mr two our very these end first just people after get also even most should return over such many see well know much struct good before same long way because make those think must where down int here being us u little did last
The character, digraph and trigraph frequencies above take programming text into account. But since most people are not programmers, it is also useful to see character frequency when programming text is entirely removed. (Big key frequency is unchanged so it is not included.)
Character Frequency: e t a o i n s r h l d c u m g f p w y b , . v k ' " - x 0 j 1 q 2 z ) ( : ! ? 5 ; 3 4 9 / 8 6 7 [ ] % $ | * = _ + > \ < & ^ # @ ` ~ { } Letter Frequency: e t a o i n s r h l d c u m g f p w y b v k x j q z Punctuation Frequency: , . ' " - ) ( : ! ? ; / [ ] % $ | * = _ + > \ < & ^ # @ ` ~ { } Number Frequency: 0 1 2 5 3 4 9 8 6 7
Dictionary Frequency
Unix dictionary (234936 words): e i a o r n t s l c u p m d h y g b f v k w z x q j SCOWL dictionary of everyday words (68124 words): e s i a r n t o l c d u g p m ' h b y f k v w x q j z SCOWL complete dictionary (576920 words): e s i a n r o t l c u d p m h g b y ' f v k w z x j q - /Note: These word lists contain both singulars and plurals, so words with plurals are over-represented in the letter frequency count.
More Frequencies
Double Character: ll SPC-SPC ENT-ENT ee ss oo tt ff rr pp 00 mm nn .. cc dd -- TAB-TAB gg !! )) :: // ** bb __ == 11 99 || ii & ww 22 ++ aa zz '' $$ (( xx 33 55 44 "" << \\ }} 66 88 kk 77 qq ## hh ^^ yy ?? uu >> ]] vv [[ jj ;; %% ,, @@ `` {{ ~~
Double Letter: ll ee ss oo tt ff rr pp mm nn cc dd gg bb ii ww aa zz xx kk qq hh yy uu vv jj
First Letter: t a s i o w c b h p f m d r e n l g u y v k j x q z
Second Letter: o e h a n i r u t f s l p y c m d x v b w g q k z j
Third Letter: e t r a s o i d n l m u c p v g f w y b k h x z j q
Last Letter: e s t d n r y o f g l a h m k i w p c u x b v q j z
First Digraph: th in an to of re co be st ha wh fo se he wi no ma is pr it on de ca wa so mo as hi ar we yo al pa ne li sa do di at fi
Last Digraph: he ed nd ng er to of on in is re at es or nt as st ly en an le ll se it al ts th me ve ce ut ch te ld rs ne ns ry id ay
To get these data, I created a large text file with a variety of text and used a program I wrote to count letter frequency. I made this web page because none of the above sites were detailed enough for me, and I thought others might feel the same way. For more information, see Theory of Letter Frequency.