Hacker News new | ask | show | jobs
by adolph 887 days ago
> a data lake with 40,000 Parquet files. You need to list the files before you can read the data. This can take a few minutes.

Sounds like this data lake could use a Parquet file listing the Parquet files.

Butter

1 comments

Yea, that's exactly what Delta Lake does. All the table metadata is stored in a Parquet file (it's initially stored in JSON files, but eventually compacted into Parquet files). These tables are sometimes so huge that the table metadata is big data also.