| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vijay_nair 2856 days ago

Few things I learned after a bit of research:

• the key to efficiency here seems to be “caching”, more specifically their caching strategy

• traditionally, caching on the web is done by assuming resource access follows the Zipf Distribution[1]

• Zeta Distributions are basically Zipf Distributions[2] so you can effectively re-word the title as “Efficient data loading using caching” (zipf = “caching” & zeta = zipf => zeta = “caching”)

• It’s important to note that Zipf/Zeta don’t model extremes very well, so there’s potential for outliers causing costly cache misses. Monitor your logs!

---

Further reading:

• https://pdfs.semanticscholar.org/337e/4b7f57ccbb7485950b93da... (1999)

• https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs...

• https://en.wikipedia.org/wiki/Zipf%27s_law

• https://www.springer.com/in/book/9781402080494

---

[1] distribution follows a logarithm, so the most popular resource is accessed disproportionately more than the second most popular item and so on.

Example is word frequency, modeled as 1/n; second most popular word occurs 50% as much as the first most popular word (1/2), third most popular word occurs 33% as much as the first (1/3) and so on, showing an exponential falloff with a long tail. It thus makes sense to cache the first 10 most popular words as they are going to get accessed more than 90% of the time, giving you the efficiency. Basically this is a form of power law and similar to Pareto Distribution (20% of the things deliver 80% of the result)

[2] rigorously speaking, zeta is the normalized form of Zipf. But practically they are similar enough that people use the terms interchangeably.

1 comments

fed135 2848 days ago

Damn, it's like I don't even need to write the paper at all :) Great research work, it does capture the idea of the project.

link

vijay_nair 2847 days ago

Given that this is HN, my comment probably came across as a disappointment and most people were already aware of the surface level details of caching.

Hope I kept them at least mildly entertained while waiting for the real deal to drop : )

link

fed135 2836 days ago

Here's the recently updated Wiki page, it's not super in-depth, but please let me know what you think :)

link