| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by oliwarner 2142 days ago

We've been using it pretty heavily and the ability to do deep queries like Person.objects.filter(data__family__spouse__name__istartswith="oli") is great for one-offs but the performance stinks.

As it well should —this isn't a complaint— just a reminder that keeping denormalised, first party data that can be well indexed is invaluable when you're talking about huge datasets.

It'd also be nice to layer in JSON Schema directly into the modelling but there is a draft-7 compliant Python project that you can use for this in the model's clean() method.

2 comments

DasIch 2142 days ago

Postgres allows you to create an index on keys within a JSONB column[0]. It may be the case that Django doesn't offer support for this but I would expect you can probably get around that with raw SQL.

[0]: https://www.postgresql.org/docs/current/datatype-json.html#J...

link

Townley 2142 days ago

In addition to raw SQL, postgres JSON fields can be indexed with GinIndex https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/i...

link

oliwarner 2141 days ago

Yeah I've not had too much luck with GINIndex. Firstly I can't rule out user error but they just didn't work well. Maybe it was the volume (and the index generation wasn't keeping up with the ingress of data —"only" around 2w/s— or it's the 50M rows...

This was WORM data so denormalising on write didn't need update triggers, and was the difference between 100ms and 10s queries. That was still too slow for us so we started caching expected data outside postgres.

link

agumonkey 2142 days ago

> data__family__spouse__name__istartswith="oli"

that's some syntax right there :)

link

kfarr 2142 days ago

Couldn't tell if you meant to say it was good or bad, but actually I found it to be crystal clear which is impressive for a complicated dataset.

link

agumonkey 2142 days ago

neither, I guessed it was better than what was prior even though it stings the eye

link