| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jandrewrogers 1331 days ago
	All cache replacement algorithms are literally equivalent to universal sequence prediction problems, per the optimality theorem. There is no implication of sequential decisions here. When you schedule a batch of disk I/O, you are essentially front-running the sequence predictor to avoid classes of prediction failure where successful prediction would be computationally intractable (and therefore not implemented in real systems), which is expected to produce better I/O throughput on average if done competently per the same theory. There is nothing magic about this, it is in the literature, and databases in particular have explicitly exploited non-sequential scheduling to circumvent fundamental sequence prediction limits for decades. Optimally anticipating future requirements for reads and writes can be called whatever you like, but that remains the primary use case for async I/O since you can't do it with blocking I/O in a single thread. This becomes more important as caches become larger because cache efficiency increases are strongly sublinear as a function of size, as expected. Servers are already at the scale where very deep async I/O scheduling is required for consistent throughput with high storage density, beyond what can be done via traditional buffered disk I/O architectures, async or not. It is an active area of research with some interesting ideas.