Hacker News new | ask | show | jobs
by oliwarner 2142 days ago
We've been using it pretty heavily and the ability to do deep queries like Person.objects.filter(data__family__spouse__name__istartswith="oli") is great for one-offs but the performance stinks.

As it well should —this isn't a complaint— just a reminder that keeping denormalised, first party data that can be well indexed is invaluable when you're talking about huge datasets.

It'd also be nice to layer in JSON Schema directly into the modelling but there is a draft-7 compliant Python project that you can use for this in the model's clean() method.

2 comments

Postgres allows you to create an index on keys within a JSONB column[0]. It may be the case that Django doesn't offer support for this but I would expect you can probably get around that with raw SQL.

[0]: https://www.postgresql.org/docs/current/datatype-json.html#J...

In addition to raw SQL, postgres JSON fields can be indexed with GinIndex https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/i...
Yeah I've not had too much luck with GINIndex. Firstly I can't rule out user error but they just didn't work well. Maybe it was the volume (and the index generation wasn't keeping up with the ingress of data —"only" around 2w/s— or it's the 50M rows...

This was WORM data so denormalising on write didn't need update triggers, and was the difference between 100ms and 10s queries. That was still too slow for us so we started caching expected data outside postgres.

> data__family__spouse__name__istartswith="oli"

that's some syntax right there :)

Couldn't tell if you meant to say it was good or bad, but actually I found it to be crystal clear which is impressive for a complicated dataset.
neither, I guessed it was better than what was prior even though it stings the eye