| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mturmon 2845 days ago
	Hmm, nice result. That makes me want to map letters onto primes in order of frequency. So "e" is 2, "t" is 3, etc., see: http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/...

2 comments

romwell 2845 days ago

At this point, you're yak shaving, where the sane thing to do is simply to count the number of times each letter occurs.

Better complexity-wise as well, both in terms of speed and storage: O(n) and O(1), respectively. [1]

The proposed algorithm uses O(n) storage (to store the immense product), and given that integer multiplication complexity is somewhere between O(n) and O(n^2), we end up[2] with at least O(n^2) runtime complexity!

[1] ...assuming the input data is less than 15 million terabytes in size; O(n log n), O(log n) respectively for arbitrary length input. Storing the number of input bytes takes O(log n) space, and integer increment can take O(size).

[2]https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

link

mturmon 2844 days ago

OK, on my (longer at 235886 words) /usr/share/dict/words, I find that:

  sequential encoding: 21882 words overflow 2**64 (9.3%)
  frequency encoding: 2945 words overflow 2**64 (1.2%)

Some other trivia:

  mean #bits for those words over 64 bits: 
         seq = 72.0; freq = 69.3
  largest #bits: 
         seq = 115.4; freq = 101.0
  word w/ largest #bits: 
         seq = thyroparathyroidectomize [1]
         freq = pathologicopsychological [n/a]

[1] https://www.merriam-webster.com/medical/thyroparathyroidecto...

link