| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by oefrha 2295 days ago

A tangentially related question about these codeless GraphQL API/app generators while we're on this topic. I'd appreciate opnions on this one.

A while back I was building a read-heavy site where the most popular views are easily cachable -- think a scoreboard monitored by hundreds or thousands of people simultaneously, and updated every 30 seconds. The view, or data for the view should obviously be cached, so that thousands of people don't need to hit the database at the same time with the same query.

I looked into whether I could leverage Hasura for this (no particular reason, just heard raving reviews and wanted to give it a shot). Turns out Hasura always hits the database with every single request, and it seems there's no way to avoid this. Of course GraphQL doesn't lend itself well to caching either.

Are there any solutions to this sort of problem? Or was my use case fundamentally not suitable for Hasura or similar tools?

3 comments

gavinray 2295 days ago

In large part, caching in GraphQL stems from client-side tools like Apollo or Relay, which have a fairly sophisticated cache ability.

https://www.apollographql.com/docs/react/caching/cache-confi...

https://relay.dev/docs/en/network-layer#caching

You can also implement things like Dataloader to batch/cache your requests:

https://github.com/graphql/dataloader

Hasura in particular implements two forms of caching. While not directly data-related, it does cache both the GraphQL query-plan and SQL query-plan with prepared statements:

https://hasura.io/blog/fast-graphql-execution-with-query-cac...

Hasura's architecture ensures that 2 "caches" are automatically hit so that performance is high:

GraphQL query plans get cached at Hasura: This means that the code for SQL generation, adding authorization rules etc doesn't need to run repeatedly.

SQL query plans get cached by Postgres with prepared statements: Given a SQL query, Postgres needs to parse, validate and plan it's own execution for actually fetching data. A prepared statement executes significantly faster, because the entire execution plan is already "prepared" and cached!

So cache on the client-layer, with cache on the server layer, especially if you implement something like Relay which has the ability to fetch only individual fragments, leads to pretty tiny and performant queries.

link

oefrha 2295 days ago

I'm aware of client side caching, but I'm talking about server-side caching. When thousands of people hit the server for the exact same updates every 30 seconds (for my scoreboard), client side caching doesn't help with anything.

I'm aware of the query plan caching mechanisms in Hasura too (I read exactly that blog post), but they don't solve the "always hits the database with every single request" issue, unlike a handwritten API endpoint / a traditional server rendered view where either the db response or the entire HTTP response could be easily cached with redis/memcached.

link

xtagon 2295 days ago

Have a look at Materialize[0]. It automatically updates your materialized views using a dataflow model (differential dataflow) so that reads are fast even if you're running the same query over and over. It's a step up from rolling your own materialized views in Postgres or whatever, because you don't have to create triggers or anything like that, you just create views in plain SQL.

Then of course you can use a GraphQL layer above it, just like you would with any GraphQL resolvers backed by SQL, or just cut out the middleman and use plain REST instead of GraphQL.

[0]: https://materialize.io/

link

gavinray 2295 days ago

There's nothing preventing you from setting Redis/Memcached in front of your GQL server. Apollo Server provides server-level cache strategies too I believe.

link

oefrha 2295 days ago

Another caching layer in front of the GraphQL server could work. Apollo Server is mutually exclusive with a zero code solution like Hasura though.

link

mattkrick 2295 days ago

put redis in front of your graphql server. let the key be the query hash. and the ttl be 30 seconds. if your query has many duplicate lookups (e.g. the same user appears multiple times on the leaderboard) use dataloader.

link

oefrha 2295 days ago

I suppose authentication & authorization would complicate things a fair bit, but yeah this should at least work for white-listed, public queries. Thanks.

link

cpursley 2295 days ago

You can always use a Postgres materialized view.

link