Hacker News new | ask | show | jobs
by AgharaShyam 126 days ago
100% agreed regarding shipping a read-replica, for any sufficiently complex enterprise app (ERP, CRM, accounting, etc.).

Customers need it to build custom reports, archive data into a warehouse, drive downstream systems (notifications, audits, compliance), and answer edge-case questions you didn’t anticipate.

Because of that, I generally prefer these patterns over a half-baked built-in analytics UI or an opinionated REST API:

Provide a read replica or CDC stream. Let sophisticated customers handle authz, modelling, and queries themselves. This gets harder with multi-tenant DBs.

Optionally offer a hosted Data API, using something like -- PostgREST / Hasura / Microsoft DAB. You handle permissions and safety, but stay largely un-opinionated about access patterns.

Any built-in metrics or analytics layer will always miss edge cases.

With AI agents becoming first-class consumers of enterprise data, direct read access is going to be non-negotiable.

Also, I predict the days of charging customers to access their own goddamn data, behind rate-limited + metered REST APIs are behind us.

1 comments

I fully agree in spirit, but in practice, read-replica's have some edge cases that are hard to control for. Namely, the incentives aren't fully aligned between the database host and consumer, and that dynamic can lead to some difficult resourcing decisions for the DB host. Whereas an API can be rate limited or underlying API queries can be optimized (however frustrating that might be for consumers).

The CDC stream option you flagged is more viable in my (admittedly biased) opinion. At my company (Prequel) our entire pitch is basically "you should give your customer's a live replica of their data in whatever data platform they want it in" (and let us handle the cross-platform compatibility & multi-tenant DB challenges).

I think this problem could also be a killer use case for Open Table Formats, where the read-replica architecture can be mirrored but the cost of reader compute can be assumed by the data consumer.

To your point, this is only going to be more important with what will likely be a dramatic increase in AI agent data consumption.