Hacker News new | ask | show | jobs
by mcdonje 1201 days ago
Apache Spark / Databricks is an example of this. Parquet files are stored in folders. A folder is assumed to hold one dataset split into multiple files based on specified partition criteria. The VMs read the necessary files into memory and then operate on it.