Hacker News new | ask | show | jobs
by jdrock 6074 days ago
(1/26)^7 .. without getting too fancy with any sort of linguistic probabilities...
4 comments

Well, let's get fancy:

  f	%2.228	0.02228
  u	%2.758	0.02758
  c	%2.782	0.02782
  k	%0.772	0.00772
  y	%1.974	0.01974
  o	%7.507	0.07507
  u	%2.758	0.02758

  =

  1/185,399,389,457
This is global letter frequency, I need a table of first letter frequency.
Could not find a good first letter table, so made my own from gutenberg, so:

  f	%3.779	0.0378
  u	%1.487	0.0149
  c	%3.511	0.0351
  k	%0.690	0.0069
  y	%1.620	0.0162
  o	%6.264	0.0626
  u	%1.487	0.0149

  =

  1/487,158,294,227
That's the odds of it happening in 7 specific lines. But what are the odds of it happening by chance at any point in time to any person of Schwarzenegger's importance or higher?
Just multiply those odds by the percentage of people that qualify as important.
And by the number of 7-line sequences produced by each.
One has to factor in that there are thousands of things he could have printed that would have had roughly the same effect (eg. "Piss off", etc).
= about 8,000,000,000 to 1

But capitalization and spacing was correct too.

Capitalization would be correct in almost all cases. You always capitalize first letter and chances are, first word of each line won't be the beginning of a sentence. Ditto for spacing, 3-5 line paragraphs are fairly common.
How do you figure "almost all"? Even if the text could only contain paragraphs of length 3 and 4, the spacing alone would be incorrect in half the cases (assuming that a 3-4 and 4-3 split is equally likely).

(Of course, we are leaving aside the fact that the full body of the message contains an additional one-line paragraph, which could also be considered part of a correctly capitalized and spaced message.)

One would have to analyze all previous relevant correspondence out of the AS's office as to measure vocabulary and other structural elements.

It would be a fun project.