Hacker News new | ask | show | jobs
by stevencorona 1095 days ago
I've been using Elixir for the past 5-6 years for my startup. We use pg_notify extensively to broadcast changes between running nodes (basically, use Phoenix.PubSub locally in our apps, with a GenServer to subscribe+re-broadcast using pg_notify).

This has been a really elegant and low-complexity way to get distributed pubsub without the complexity of running a distributed erlang cluster (which seems a lil bit painful in a K8S+Continuous Deploy world)

There -are- some big downsides to be aware of though.

1. You can't use PgBouncer w/ LISTEN/NOTIFY. This has been really painful because of the high memory overhead of a pgsql connection + elixir keeping a pool of open pgsql connections. The tried and true method of scaling here is to just use PgBouncer. We've kicked the can on this by vastly over-provisioning our pg instance, but this has cost $10s of thousands on the cloud. Of course, it's solvable (dedicated non-pgbouncer connection pool just for LISTEN/NOTIFY, for example), but painful to unwind.

2. The payload has a fixed size limit (8KB, IIRC). This has bitten us a few times!

Even though I really like pg_notify, I think that if I were starting over, I'd probably just use Redis Pub/Sub to accomplish the same thing. Tad bit more complex if you're not already running Redis, but without the downsides. (Of course, w/ Redis, you don't get the elegance of firing a notification via a pg trigger)

7 comments

We got around this at my company by just pooling all of the LISTEN/NOTIFY streams into a single database connection in software, here's a sample implementation:

function software_listen(channel, callback):

  if not channel_listened(channel):

    sql("LISTEN " + channel)

  listeners[channel].append(callback)

function on_message(channel, data):

  for listener in listeners[channel]

    listener(channel, data)

function unlisten(channel, listener):

  listeners[channel].remove(listener)

  if len(listeners[channel]) == 0:

    sql("UNLISTEN " + channel)

Here's the actual go implementation we use:

https://gist.github.com/ColinChartier/59633c1006407478168b52...

For #1 I've been keeping a keen eye on pgcat [1], in particular the https://github.com/postgresml/pgcat/issues/303 which implies that it should be possible to add support for transaction mode LISTEN/NOTIFY support.

[1] https://github.com/postgresml/pgcat

Phoenix.PubSub is basically a noop service. It really just works.

If discovering nodes is difficult in your env, try using a listen/notify libcluster strategy:

https://github.com/supabase/supavisor/blob/main/lib/cluster/...

A noop service as in a service that is mature and needs no additional modification?
Yeah as in zero ops needed. Maybe NoOps.
I believe #2 was the main driver for the supabase team to build their real-time component: https://github.com/supabase/realtime

Background/announcement: https://supabase.com/blog/supabase-realtime-multiplayer-gene...

Triggers can affect throughput quite a bit on busy tables.

And we didn't want people schemas polluted with triggers.

But also we use Erlang distribution and Phoenix.PubSub because with a global network clients connected to the same node in the same region will get normal Broadcast messages to each other faster. If we ran them through Postgres or Redis the added latency wouldn't work for a global thing.

Which as far as I gather uses Phoenix + Erlang distribution to get Phoenix.PubSub going. Which parent was not wanting to do. Funny full circle.
Have you contacted the Supavisor team about adding listen/notify support? Supavisor may be your way forward: https://github.com/supabase/supavisor
I found a relevant GitHub issue for this: https://github.com/supabase/supavisor/issues/85
Accepting pull requests if anyone is interested :D
Does GenServer's mailbox introduce any lag/latency? We've found at high loads, GenServer's mailbox can be a bottleneck
WalEx gets around these issues.
Hi! Does WalEx continue where it left off after disconnects?