Hacker News new | ask | show | jobs
by hodgesrm 390 days ago
> If we look at the time a bit closer, we see the queries take anywhere between a minute and half an hour. Those are not unreasonable waiting times for analytical queries on that sort of data in any way.

I'm really skeptical arguments that say it's OK to be slow. Even on the modern laptop example queries still take up to 47 seconds.

Granted, I'm not looking at the queries but the fact is that there are a lot of applications where users need results back in less than a second.[0] If the results are feeding automated processes like page rendering they need it back in 10s of millisecond at most. That takes hardware to accomplish consistently. Especially if the datasets are large.

The small data argument becomes even weaker when you consider that analytic databases don't just do queries on static datasets. Large datasets got that way by absorbing a lot of data very quickly. They therefore do ingest, compaction, and transformations. These require resources, especially if they run in parallel with query on the same data. Scaling them independently requires distributed systems. There isn't another solution.

[0] SIEM, log management, trace management, monitoring dashboards, ... All potentially large datasets where people sift through data very quickly and repeatedly. Nobody wants to wait more than a couple seconds for results to come back.