| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by icsa 813 days ago
	For what kind of data? For what kinds of queries? If the columns are scalar then consider a column store.

1 comments

paperwhite 812 days ago

We have Data heavy products

Typical use case:

Filtering billions of 'docs' or rows across 50 attributes in any number of combinations

A doc holds data on the techstack of a url - For one of the products. Ever evolving schema (hence why we picked Mongo)

Number of queries are low but the complexity of the queries are high

We've moved object storage from JSON to Parquet already before uploading to Mongo so that's already 1/4th the object storage

Question if something can match the querying performance of Mongo but has better storage efficiency

link

icsa 812 days ago

If the attributes are scalar, I would still suggest a column store that supports null values. Column compression will save you much space and give you excellent OLAP query performance.

As the schema evolves, simply add new columns.

link

paperwhite 812 days ago

Checked

Some attributes are scalar but others are not

There are array of objects in a lot of places. will need to modify the schema to be completely scalar

Will run a local test & see how this goes. Thanks a lot!

link

icsa 811 days ago

If the arrays of objects can conform to a single schema (across all scalar attributes), then make a second table to hold the objects in the arrays.

Now you have a two table schema with (at most) one join in a given query.

Check out Vertica. It does a great job at various forms of compression. In addition, DuckDB is an easy way to get started with efficient OLAP queries.

link