Hacker News new | ask | show | jobs
by beaconstudios 2858 days ago
> New features require building a routing map for all possible URLs on boot, which is a dramatically heavy operation. We were eventually forced to forgo our traditional ORM layer entirely to get any degree of performance (insert: shock/awe) - Related: It turns out SQLite3 has a query limit of 999 SQL variables, so we had to implement a recursive query strategy for SQLite only. Wild.

Out of pure technical curiosity, how did this come to be the chosen approach? A query with 999 variables strikes me as a serious code smell, but at the same time I don't doubt you guys seriously considered the alternatives before choosing this route.

1 comments

We're loading all public-facing resources - so posts, tags, authors and importantly the relations between them - in as few queries as possible, using fairly typical where...in queries in order to build the relations. SQLite uses variables for in queries, and so sites with a lot of content triggered the "more than 999 variables" error.

Solvable, of course, but quite an interesting limitation to discover!

I haven't check the code, but as few queries as possible is not always the best approach. I bet you guys spent a lot of time on the problem, but still very curious how you landed with this decision.

For the relation data, you should check out Hexastore [0] (essentially graph storage). It takes up much more space, but you can follow any paths of relationship and you can do it step by step instead of loading everything at once.

[0] http://www.vldb.org/pvldb/1/1453965.pdf

I discovered a similar limit with postgresql, mainly due to the protocol sending the query length as a 16-bit number, limiting the total length to ~65k characters. In that case it was far better to create a temporary table and JOIN that rather than using an " IN (..)" clause anyway, so that's another option if SQLite supports it.
There's no 16bit number limiting query length in PG, the limit is 1GB and has been for a LONG time. But it's indeed often a good idea to use a temp table for some of such cases.
This sounds like an interesting usecase for a graph database, is that something you have considered?
It's an interesting idea but Ghost is a primarily self-hosted application with a non-technical user base. The freedom to swap out or add core dependencies like database servers isn't really available in that environment.
Oracle had this limitation, too, or at least it did a few years ago.