Hacker News new | ask | show | jobs
by andrewingram 3274 days ago
I mean to structure your server in such a way where you're not trying to automatically convert your entire query AST into a single massively complicated SQL query. It's true that in simple cases, one complex query can be the fastest way to get the data you need, but you leave this territory pretty quickly.

It's perfectly valid to identify subtrees of your query that would benefit from being executed as a database query, but to do that to your entire query just sounds like you're asking for trouble, I'd even go so far as to call it a premature optimization.

1 comments

I kind of understand why you'd think it's a massively complicated query. You are probably thinking the types of joins on two tables on random columns with weird conditions which go into full table scan.

I am talking about queries/joins between tables that have foreign keys between them, like client/project/task/comment. I bet 90% of graphql schemas expose those kinds of relations between types.

For those type of relations (with FK) i can generate a single query that is as fast as it can be (certainly faster then dataloader) and as far as i've tested (a few millions of rows in tables, 3-7 levels in a query) i didn't leave the fast territory :) Of course there might be edge cases ...

About premature optimisations. Everyone likes to quote that, but never the full one which is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." https://shreevatsa.wordpress.com/2008/05/16/premature-optimi...

The paper where it was published was about using goto to optimize loops and such small things, it was never "targeted" at algorithms and architecture.

As i think i mentioned here above, i was getting 10X throughput with joins compared to dataloader and i would not call that premature.

I use a programming language that is 1-2 orders of magnitude slower than C, should I switch today?

I get my data for a full UI within the budget i've allowed for uncached scenarios (100ms, but I want to go to 50ms). The approach you're suggesting will (in some circumstances) give me some short-terms win in terms of throughput and response times, but you've not said anything to suggest that I won't lose these benefits as I gradually transition into a domain-driven or micro-services (yuck) architecture.

I like to build my GraphQL servers under the assumption of a domain-driven architecture (because that's where all the projects I've worked on seem to end up, your mileage may vary), and then shoe-horn in some short-term performance tricks when I can.

I'm possibly a special snowflake here, but it's been a long time since i've had the opportunity to work on a project where I can go straight to the DB. Be it Elastic Search, a 3rd party, ill-advised micro-services, or complex logic in-between storage and presentation; nothing has quite been a pure DB project in the last 6-7 years.

Of course, you could argue this is premature architecture ;) but many of these complexities are from day one, or at least pretty early in a project's life.