Hacker News new | ask | show | jobs
by mlochbaum 19 days ago
That's not O(N log log N), it's more like N^2. Prime sieves are hard to implement well with immutable arrays for obvious reasons; there are some cool methods but they're definitely harder. I'm ashamed to be part of a community that won't cop to this.

The algorithm iterates over numbers ⍺ from 2 to N, removing the multiples that are greater than ⍺ and no greater than N from p. If the removal with ~ has to inspect all of p, then all the primes are there so we have asymptotically at least N/log N entries by the prime number theorem and we get N^2/log N time (when ⍺ is over N/2, no multiples are in range so this can be skipped, but that just cuts a constant factor 2 from the time). Conceivably p could be marked sorted, so the entries to remove could be found with a binary search. This is a bit harder to analyze, but I think each prime under √N will cause the list to change, and incur N/log N data movement. So you get at least (N/log N)^(3/2) cost, still quite a bit worse than linear.

Edit: changed the algorithm while I was writing... the new one is better, it keeps one list p of primes and one list ω of not-yet-marked-out numbers. However, primes are removed from ω one at a time, so that each of the N/log N primes has to be moved for each one before it, giving at least (N/log N)^2 cost (I mean, maybe an interpreter could binary search and also recognize when ~ only drops the first entry and do that by slicing? But the (N/log N)^(3/2) from above definitely holds). Mutating a bit array in place is pretty important to classical sieve performance.

2 comments

It's possible to get O(N^(3/2)/log N) in an ordinary interpreter with some changes to the code, assuming linear-time ~ with hashing. The idea is to leave the primes in ⍵ and stop when it stops changing, which will happen at the first prime over √N. To get the complexity multiply by O(N) for each step. It's also a small change to get the multiples to remove to start at n×n instead of n.

    i←¯1⋄{⍵~n×n↓⍳1+⌊N÷n←i⊃⍵⊣i+←1}⍣≡1↓1+⍳N
I think this is about as good as can be done with ~ instead of marking out bits. And I wouldn't say it's as easy as the imperative version!
Eep. You're right. Evidently, I didn't even know what a sieve was in this context and wrote a search instead. You got me to do a bit of research. Actually, this discussion is exactly where I think APL shows one of it's strengths. It feels like a human communication tool more than any other PL I've mucked about with. The hard parts here are not language issues but fundamental understanding ones.

It's a tad tricky to carefully analyze the asymptotics of my above prime generator, since the search space of Without (~) shrinks on each iteration. I think Merten's theorem gives an estimate of e^-γ/log(p_i), which we need to sum for all primes up to N. Taking prime density 1/log(n) and integrating N/(log x)^2 over our range is O(N^2/log N), I think.

> Mutating a bit array in place is pretty important to classical sieve performance.

Challenge accepted:

    p⊣{x[⍵+n×⍳⌊N÷n←p⍪←x[⍵]]←0 ⋄ 1+⍣{(x⍪1)[⍵+1]≠0}⊢⍵}⍣{⍵=N-2}0⊣x p←(1↓1+⍳N)⍬
We just directly set roughly N/p items to 0 on each iteration—proper sieve semantics—which should give O(N log log N), unless I'm missing something.