Hacker News new | ask | show | jobs
by icsa 765 days ago
For what kind of data? For what kinds of queries?

If the columns are scalar then consider a column store.

1 comments

We have Data heavy products

Typical use case:

Filtering billions of 'docs' or rows across 50 attributes in any number of combinations

A doc holds data on the techstack of a url - For one of the products. Ever evolving schema (hence why we picked Mongo)

--

Number of queries are low but the complexity of the queries are high

We've moved object storage from JSON to Parquet already before uploading to Mongo so that's already 1/4th the object storage

Question if something can match the querying performance of Mongo but has better storage efficiency

If the attributes are scalar, I would still suggest a column store that supports null values. Column compression will save you much space and give you excellent OLAP query performance.

As the schema evolves, simply add new columns.

Checked

Some attributes are scalar but others are not

There are array of objects in a lot of places. will need to modify the schema to be completely scalar

Will run a local test & see how this goes. Thanks a lot!

If the arrays of objects can conform to a single schema (across all scalar attributes), then make a second table to hold the objects in the arrays.

Now you have a two table schema with (at most) one join in a given query.

Check out Vertica. It does a great job at various forms of compression. In addition, DuckDB is an easy way to get started with efficient OLAP queries.