| This is a big deal in the database world as delta, iceberg and hudi mean that data is being stored in an open source format, often on S3. It means that the storage and much of the processing is being standrdised so that you can move between databases easily and almost all tools will eventually be able to work with the same set of files in a transactionally sound way. For instance, Snowflake could be writing to a file, a data scientist could be querying the data live from a Jupyter notebook, and ClickHouse could be serving user facing analytics against the same data with consistency guarantees. If the business then decide to switch Snowflake to Databricks then it isn’t such a big deal. Right now it isn’t quite as fast to query these formats on S3 as a native ingestion would be, but every database vendor will be forced by the market to optimise for performance such that they tend towards the performance of natively ingested data. It’s a great win for openness and open source and for businesses to have their data in open and portable formats. Lakehouse has the same implications. Lots of companies have data lakes and data warehouses and end up copying data between the two. To query the same set of data and have just one system to manage is equally impactful. It’s a very interesting time to be in the data engineering world. |