| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 2801 days ago
	Nice in theory, but in practice you wouldn't implement it like that, especially if the words can be longer than your machine integer allows. Sorting and comparing is more elegant than invoking a BigNum library, imho (and has smaller footprint). This shows that theoretical elegance != implementation elegance.

5 comments

LukeShu 2800 days ago

I was curious how soon overflowing a native integer would come up. The "worst case" would be all "z"s (which map to 103), so how many characters does floor(log_103(2^n-1)) get you?

    int32  zzzz      (4 characters)
    uint32 zzzz      (4 characters)
    int64  zzzzzzzzz (9 characters)
    uint64 zzzzzzzzz (9 characters)

But that's the worst case, not many real words have several "z"s in them. How often are real words affected? I did some experiments with /usr/share/dict/words:

    1-( 36123/123115) = 70%  overflow int32
    1-( 39774/123115) = 68%  overflow uint32
    1-(117909/123115) = 4.2% overflow int64
    1-(118533/123115) = 3.7% overflow uint64

link

mturmon 2800 days ago

Hmm, nice result. That makes me want to map letters onto primes in order of frequency. So "e" is 2, "t" is 3, etc., see: http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/...

link

romwell 2800 days ago

At this point, you're yak shaving, where the sane thing to do is simply to count the number of times each letter occurs.

Better complexity-wise as well, both in terms of speed and storage: O(n) and O(1), respectively. [1]

The proposed algorithm uses O(n) storage (to store the immense product), and given that integer multiplication complexity is somewhere between O(n) and O(n^2), we end up[2] with at least O(n^2) runtime complexity!

[1] ...assuming the input data is less than 15 million terabytes in size; O(n log n), O(log n) respectively for arbitrary length input. Storing the number of input bytes takes O(log n) space, and integer increment can take O(size).

[2]https://en.wikipedia.org/wiki/Computational_complexity_of_ma...

link

mturmon 2799 days ago

OK, on my (longer at 235886 words) /usr/share/dict/words, I find that:

  sequential encoding: 21882 words overflow 2**64 (9.3%)
  frequency encoding: 2945 words overflow 2**64 (1.2%)

Some other trivia:

  mean #bits for those words over 64 bits: 
         seq = 72.0; freq = 69.3
  largest #bits: 
         seq = 115.4; freq = 101.0
  word w/ largest #bits: 
         seq = thyroparathyroidectomize [1]
         freq = pathologicopsychological [n/a]

[1] https://www.merriam-webster.com/medical/thyroparathyroidecto...

link

thijsvandien 2800 days ago

Now add support for Unicode. :)

link

artemisyna 2801 days ago

Or, for better time complexity (with a bit of extra space) making a map of char -> frequency and comparing the results.

link

diegoperini 2800 days ago

That's Fourier transform!

link

DavidSJ 2800 days ago

Funny. :) But a Fourier transform is reversible.

link

coherentpony 2800 days ago

The title asked for beautiful/elegant algorithms, not efficient ones! :)

link

TuringTest 2800 days ago

In computing, inefficient algorithms are not beautiful.

link

pythux 2800 days ago

I wrote some comparison some time ago https://github.com/remusao/anagrams-bench/blob/master/README...

Unfortunately, as you suggest, the complexity of arbitrary precision multiplication kicks in and performance suffers.

link

brlewis 2800 days ago

One might also choose not to bring in a BigNum library if you don't need to find every anagram. Here's an exhaustive list of anagrams I found with sorting and comparing that a quick Fundamental Theorem implementation missed:

   [ 'basiparachromatin', 'Marsipobranchiata' ]
   [ 'configurationism', 'misconfiguration' ]
   [ 'constructionism', 'misconstruction' ]
   [ 'pericardiacophrenic', 'phrenicopericardiac' ]
   [ 'anatomicophysiologic', 'physiologicoanatomic' ]
   [ 'petrographically', 'pterylographical' ]

link

brlewis 2799 days ago

Note this is out of 20K anagrams.

Here's my code: https://github.com/brlewis/brlewis.github.io/blob/master/201...

On my machine, Fundamental Theorem of Arithmetic finished in 1.703 seconds, sorting letters in 1.954 seconds.

link