| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eklitzke 891 days ago
	I haven't looked at the exact schema for this dataset but for this type of query pattern to be efficient the data would need to be partitioned by date.^[1] I'm guessing that it's not partitioned this way and therefore each of these queries that was looking at "one month" of data was doing a full table scan, so if you queried N months you did N table scans even though the exact same query results could have been achieved even without partitioning by doing one table scan with some kind of aggregation (e.g. GROUP BY) clause. [1]: https://cloud.google.com/bigquery/docs/partitioned-tables