Hacker News new | ask | show | jobs
by wvenable 3576 days ago
That's perfectly reasonable. But then you also let them query on those arbitrary custom properties and that's where the performance issues are? If so, that's a fairly hard problem to solve.

Taking the well-defined subset of searchable properties and making them columns, as described in the article, is the really the best solution.

1 comments

As of right now, our schema is literally:

    user_id | event_id | time | data
where data is a JSONB blob that contains every other piece of information about an event. Currently, we get a row estimate of one for pretty much every query. We've been able to work around the lack of statistics by using a very specific indexing strategy (discussed about in a talk Dan gave[0]) that gives the planner very few options in terms of planning and additionally by turning off nested loop joins.

We are planning on pulling out the most common properties that we store in the data column, which will give us proper statistics on all of those fields. I am currently experimenting with what new indexing strategies we will be able to use thanks to better statistics.

[0] https://www.youtube.com/watch?v=NVl9_6J1G60