| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by awgupta 3352 days ago
	it really depends on what you are doing. A large data set shouldn't be limited to longitudinal analysis. If you're storing every log record or every stock bid/ask, there may be times that you need to understand the specifics of what exactly was going on. There may be a lot of filtering on the underlying corpus for these sorts of exact match queries, but data set sizes continue to grow.

1 comments

awgupta 3352 days ago

that said, I agree that approximate functions should be part of a modern database system. Redshift has approximate count distinct (based on hyperloglog) and approximate percentiles (based on quantile summaries)

link