If the columns are scalar then consider a column store.
Typical use case:
Filtering billions of 'docs' or rows across 50 attributes in any number of combinations
A doc holds data on the techstack of a url - For one of the products. Ever evolving schema (hence why we picked Mongo)
--
Number of queries are low but the complexity of the queries are high
We've moved object storage from JSON to Parquet already before uploading to Mongo so that's already 1/4th the object storage
Question if something can match the querying performance of Mongo but has better storage efficiency
As the schema evolves, simply add new columns.
Some attributes are scalar but others are not
There are array of objects in a lot of places. will need to modify the schema to be completely scalar
Will run a local test & see how this goes. Thanks a lot!
Now you have a two table schema with (at most) one join in a given query.
Check out Vertica. It does a great job at various forms of compression. In addition, DuckDB is an easy way to get started with efficient OLAP queries.
Typical use case:
Filtering billions of 'docs' or rows across 50 attributes in any number of combinations
A doc holds data on the techstack of a url - For one of the products. Ever evolving schema (hence why we picked Mongo)
--
Number of queries are low but the complexity of the queries are high
We've moved object storage from JSON to Parquet already before uploading to Mongo so that's already 1/4th the object storage
Question if something can match the querying performance of Mongo but has better storage efficiency