|
|
|
|
|
by chadthenderson
3997 days ago
|
|
This looks very cool. Although, I'm not sure I totally understand how it can be used to replace batch ETL processes. So, PipelineDB eliminates ETL batch processing by incrementally inserting data into continuous views, but the documentation says that it's not meant for ad-hoc data warehouses as the raw data is discarded. So, does that leave me still using batch processes to load my data warehouse? Is PipelineDB going to be my data warehouse as long as I only want the resulting streamed data? Just trying to figure out what this would look like and where its place is in a data warehouse environment. |
|
In terms of not requiring that raw data be stored, a typical setup is to keep raw data somewhere cheap (like S3) so that it's there when you need it. But granular data is often overwhelmingly cold and never looked at again so it may not always be necessary to store it all in an interactively queryable datastore.
As I mentioned, PipelineDB certainly doesn't aim to be a monolithic replacement for all adjacent data processing technologies, but there are areas where it can definitely introduce significant efficiency.