| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bob1029 2196 days ago
	We effectively do dirty reads when we just go around and copy the databases for purposes of offline analysis. Our use cases for aggregation queries are all tolerant to incomplete recent data. Most of the time we are running analysis 24 hours after whatever event, so we never get into a situation where we can't answer important business questions because data hasn't been flushed to disk yet. The fact that most of the stuff we care about is time-domain & causal means that we can typically leverage this basic ideology. Very rarely does a time-series aggregate query need to be consistent with the OLTP workloads in order to satisfy a business question.