| Few things I learned after a bit of research: • the key to efficiency here seems to be “caching”, more specifically their caching strategy • traditionally, caching on the web is done by assuming resource access follows the Zipf Distribution[1] • Zeta Distributions are basically Zipf Distributions[2] so you can effectively re-word the title as “Efficient data loading using caching” (zipf = “caching” & zeta = zipf => zeta = “caching”) • It’s important to note that Zipf/Zeta don’t model extremes very well, so there’s potential for outliers causing costly cache misses. Monitor your logs! --- Further reading: • https://pdfs.semanticscholar.org/337e/4b7f57ccbb7485950b93da... (1999) • https://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs... • https://en.wikipedia.org/wiki/Zipf%27s_law • https://www.springer.com/in/book/9781402080494 --- [1] distribution follows a logarithm, so the most popular resource is accessed disproportionately more than the second most popular item and so on. Example is word frequency, modeled as 1/n; second most popular word occurs 50% as much as the first most popular word (1/2), third most popular word occurs 33% as much as the first (1/3) and so on, showing an exponential falloff with a long tail. It thus makes sense to cache the first 10 most popular words as they are going to get accessed more than 90% of the time, giving you the efficiency. Basically this is a form of power law and similar to Pareto Distribution (20% of the things deliver 80% of the result) [2] rigorously speaking, zeta is the normalized form of Zipf. But practically they are similar enough that people use the terms interchangeably. |