|
|
|
|
|
by mrwnmonm
1044 days ago
|
|
> Serra is a low-code, object-oriented ETL framework that allows developers to write PySpark jobs easily—think end-to-end dbt with the benefits of object-oriented Spark. Could you please explain this as if I am three years old? (also, I don't know dbt) |
|
ETL is the process of extracting transforming and loading data from a source to a destination in a data pipeline. Spark, an engine for large scale data processing, allows us to write code that can work with large amounts of data. dbt is a tool you can use to break up your SQL scripts into smaller “models” - other SQL scripts that can be reused and tested.
We described us as an end to end because we also have extractors and loaders, whereas dbt focuses on the T ( transformation step of ETL ). Each of our steps involved in extraction, transformation and loading correspond to a specific Python object defined in our Python framework. I have also updated the README in our repo to hopefully better explain how the config file links to user defined readers, writers, and transformers.