| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chrisjc 952 days ago

and guess what those tables use to store data a lot of the time? Parquet.

I assume that when you say "tables" you mean "external tables" since you're in the data world.

If you didn't then I guess when you say "tables", you mean "tables" with everything including the kitchen sink? Database, compute, etc...? Does the database always have to be running for the table to be accessible? Responsibility for the database hardware, resources or services?

Of course there are fantastic new databases like Snowflake and BigQuery that separate compute and storage... but do they, really? Separating storage and compute is just incredible for scaling, suspend/resume, etc. But can you query a Snowflake/BigQuery table without also having to use their compute? Is there a way that I can just get a "table" and not be forced into using a specific compute-engine and all the other bells and whistles?

So when you say "table", where and how do I get one? And to maintain the theme of the article, a columnar/OLAP/analytics "table" in particular?

As you probably know, there are several (external table) options, Apache Iceberg probably being the most talked about one at the moment. External "tables" are just collections of metadata about your files, or conventions about how to lay your files down. When you query these tables with SQL using athena, redshift, snowflake, duckdb, etc... each and everyone of those query-engines is reading parquet files.

(Snowflake, BigQuery and others are working on features to both manage and read Iceberg tables, so i kinda lied earlier)