|
|
|
|
|
by hodgesrm
1737 days ago
|
|
Thanks for your comment and sorry if I was unclear. I'm not arguing that storage and compute need to be directly coupled. However, storage does need to be very carefully optimized to match compute, especially when you are trying read events and make them available for immediate storage. ClickHouse for example has multiple formats for table parts in order to allow efficient buffering of rapidly arriving records. Using customized formats has allowed the project to evolve quickly. In fact the Lakehouse paper seems to be setting up a strawman. Here are three examples. * The new low-latency SQL data warehouses are open source. They are are not locking data in proprietary formats. We're not Snowflake. * SQL data warehouses are already headed toward support for object storage for the same reason everyone else is: costs and durability in large datasets. Here's just one sample of many: https://altinity.com/blog/tips-for-high-performance-clickhou... * Not everyone cares about ML and data warehouse integration. From my experience working on ClickHouse only a small percentage of users integrate ML. By contrast 100% of our users care about efficient visualization and keeping data pipelines as short as possible, hence the benefit of a tightly integrated server. I think there's actually a bifurcation of the market into low-latency use cases driven by event streams versus much larger datasets containing unstructured/semi-structured data stored in low-cost object storage. Lakehouse addresses the latter. SQL data warehouses are focused on the former. I don't see one "winning"--both markets are growing. |
|