| Does any of you have some recommendation on how to set up an in-memory data transformation system. I have been looking at DuckDB[1], but maybe there are interesting alternatives. Apache Arrow seems promising in some ways too. It is about running a series of transformations, defined in SQL, on many tables, but: - All the data can fit in the main memory of a single server (a few TB) - No change in the data during processing (batch processing, all source files are Parquet files, all output files are Parquet files) - Only one user at a time running the transformation (no need for transaction or isolation) - No need for persistence during the computation of intermediate tables From my understanding of how a database works, that should remove a lot of the bookkeeping that a standard database has to do in order to provide transactions and ACID properties. In return, one can expect an improvement in performance. I am looking at a kind of "memory only database". DuckDB looks like it should work, but I am not sure how mature it is. [1] https://duckdb.org/
[2] https://arrow.apache.org/ |