Hacker News new | ask | show | jobs
by luizfelberti 1158 days ago
How well does Cozo handle larger than RAM data? Does it only need to keep the query's answer set in memory?

By larger than RAM I mean the entire WikiData knowledge graph (~100GB), with something like 16GB of RAM.

Another question: any plans for supporting Parquet files with query pushdown? I honestly doubt Parquet's efficiency can be matched with RocksDB (but I'm happy to be proven wrong), and having to convert big datasets is always a pain...

2 comments

For the parquet question: currently CozoDB is developed by a single developer (me), but I am starting to explore ways of expanding the development team. Certainly a lot more features will be added if that happens, and parquet support looks like a really useful one.
Any contact info for you to discuss contributing?
Yes this is correct, only the query's answer set need to be in memory. We are also working on streaming for the Rust API, in which case you don't even need to keep the whole set in memory for simple queries.

FYI here is a not very rigourous performance and memory usage analysis (for a previous version without the vector search capability): https://docs.cozodb.org/en/latest/releases/v0.3.html

Thanks for the speedy response!

Cozo is looking like a top-contender for my project so far :)