|
|
|
|
|
by tomnipotent
1685 days ago
|
|
A data lake can be home to many different data formats e.g. parquet, AVRO, Thrift, protobuf, ORC, HDF5S, CSV, JSON all co-existing together. Spark lets you create a virtual abstraction over all of this, and query it as though it was a homogeneous database. There's no need to import data into a centralized format and schema. This really all ties back to the "old" Hadoop days, and is an evolution of compute over data not in a fixed and managed format/schema. |
|