Hacker News new | ask | show | jobs
by mdellavo 1583 days ago
Are you unrolling the nested JSON data structures and storing as traditional K/Vs in an EAV pattern? Possibly using one table for each datatype or using a sparse table?

I'd be curious how this performs for complex queries - does this rely heavily on index intersection?

2 comments

Yes, an EAV pattern is being used. The end result in query speed with the particular table designs and indexes is essentially just like what index intersection gives you but without the setup. And of course EAV is not a pattern that you can easily roll out by hand when you need it.

Please try this demo which is a sample of some queries against Clinical Trials data. https://schemafreesql.com/demo.html#clinicalTrial Although this demo data set is a limited size, the same queries when run against the entire clinicalTrial dataset performed very respectably.

EAV is a pattern you can easily roll out - https://docs.sqlalchemy.org/en/14/orm/examples.html#module-e...

Another commenter illustrated some of the issues you will run into with this pattern at scale.

It would probably help to benchmark some complex queries on a sizable data set. And compare against mongo, postgres jsonb, vanilla eav, clickhouse, etc. Without much information to go on, it's hard to know what this is.

I agree with getting some benchmarks up there. Until that's done, I can share what some preliminary tests revealed. Just to give you more of feel for where this tool might fit.

We imported the complete Clinical Trials dataset (380k docs ~ 200 attributes each) into SFSQL, Mongo and a Postgres JSON column.

Import speed with raw documents: Mongo and Postgres win hands down. Why? Very little processing to be done on their parts, while SFSQL unravels the structures and stores everything with indexes already in place. Excluded use cases for SFSQL? high-volume logging, data-sinks, etc.

Query speed against raw document elements: Mongo and Postgres very fast, SFSQL respectable. Why? Mongo is optimized for querying raw documents, Postgres obviously did their work as well. Excluded use cases for SFSQL? storage of raw, unprocessed json documents.

Then we extracted all unique instances of a particular attribute from that data and put them into their own collection in mongo and it's own table in Postgres. The number of distinct objects extracted to external collection/table was just 11. Then we modified the queries so that they JOINed to the external collection/table. Result: SFSQL still respectable (nothing changed internally or speed wise). Postgres and mongo displayed a huge slowdown (and this was just a single join). Included use cases for SFSQL? complex relational/referential data.

what about applying a GIN index on the JSONB?
Did not try that kind of index on postgres. Thx. Will keep that on the list for benchmarking.
A more detailed response will be provided shortly but I do encourage you to launch a demo, https://schemafreesql.com/demo.html. You will be provided with an endpoint and access key to your own dedicated SFSQL service. No login required. You can start trying it out from within your own environment.