Hacker News new | ask | show | jobs
by mscavnicky 457 days ago
DataFusion wasn’t a fit because it doesn’t do external indexes. But why not extend it?
1 comments

We actually tried to extend DataFusion at first but ultimately decided against it since we can get most of the value by using Arrow and its compute kernels directly. DataFusion also executes filters in a way that rebuilds the underlying arrays (data copy) and requires strict schema, which is not a good fit for our schemaless document-oriented model. In the end, switching from DataFusion to Reactor gave us 3x better latencies.