| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sethammons 3309 days ago

They addressed your second point in the article. On a popular post, you would be storing several megabytes of data to capture/relate each unique user that visited. That gets expensive at scale. HLL takes then down to a few kilobytes, less than 1% of the original size.

For your first suggestion, you would have to do a very expensive look up. You couldn't cache it effectively due to the requirement of near real time stats. You could improve look up time using columnar storage, but the performance and memory usage will be nowhere near as nice as with HLL.

Problems are harder at scale.

1 comments

eropple 3309 days ago

I've had a "phases of computing" article percolating for a while to this end. Problems aren't just harder at scale, but they actively change their observable properties because of the stressors involved and where they crop up.

link