Hacker News new | ask | show | jobs
by akrolsmir 1549 days ago
More context: The main thing I'm unhappy with is the extra developer burden imposed by needing to denormalize information. E.g: I have a user document in Firestore, with userId, name, and avatarUrl. If I want to be able to fetch a list of comments and have the name & avatarUrl of the creators, in Firestore I have to write those alongside userId. Then, if I later add isVerified to the user document, I need to either backfill my entire db and denormalize again; or client-side handle the missing case.

Then the other pain point is the "joins" use case; right now we do the equivalent of fetching all comments & users, then doing an in-memory join. Ideally, we could craft a single request that just says "get the 10 latest comments on this market, plus the associated avatars" without data duplication and without doing a bunch of up-front thinking about exactly how to structure indexes.

My hesitation with relational DBs comes from the mismatch between client data model (loosely, JSON objects of pointers) and how it's represented in the DB (in a row); plus the requirements of needing to specify your e.g. indices up front, and annoyance of doing migrations. I'm hopeful someone's found a graph-type solution to work really well for them!

5 comments

I mean, if you don’t want to denormalize your data, you’re going to need to think about indexes in some capacity regardless (this is true for non-relational DBs like Dynamo and Mongo too).

"get the 10 latest comments on this market, plus the associated avatars" couldn’t be better suited to a relational DB. That’s a textbook use case that Postgres would be amazingly well suited to.

Also: remember with Firestone that you’re paying for redundancy and availability that’s entirely Google managed. Most DB offerings you work with on your own are significantly more hands on as far as recovery / backups / replication go.

Engineering time is usually more expensive than server costs when you’re a startup, so think about how much time it’d take to do it yourself before you decide to optimize your server costs over R&D costs.

Yeah - totally agreed re: eng time > server costs; the db costs are the least significant part of the equation.

Fundamentally, I'm trying to optimize for something like "developer happiness as we build out lots of new features quickly". My dream workflow would look something like: take the Typescript types we've defined on the client, and shove that somewhere in the cloud; then later query to pull out exactly the data we need to render any particular view of our site (ala GraphQL)

AND I'd really like to not have to spend a lot of up-front time knowing exactly which indices to set up, or to figure out complicated migrations later. And I'd like to not think about hosting/managing replications, etc. Maybe that's too many asks, and I'm being too greedy! I'm just hoping that someone's solved this pain point already, and I just haven't heard about it.

We do something kind of similar, only in reverse, with Prisma and PostgreSQL (models are defined in Prisma and TypeScript types are automatically generated for use in the client). It's been a pleasant developer experience, though we do not have any realtime data needs yet (we just do basic polling for the few parts that need reactivity based on database updates). I wasn't aware of the Supabase realtime PG project which was discussed here, so thanks for bringing that to our attention!
Supabase developer here. If you're interested in realtime, be sure to mark your calendar for our upcoming launch week (we do this a few times a year.) It's the week of March 28 and you'll definitely be very interested in some of the stuff we're launching.
Not a well-understood solution to me, but you may be interested in looking into FaunaDB. IIUC, It works by defining a GraphQL API and queries you want to use on it, and it will create the correct data structures behind the scenes to allow efficient queries of the form you've provided.

It is really new, though, from all I can tell.

Why not do something like:

1) Fetch the list of comments

2) Add a listener on the public user info for each comment poster

3) Render the comments immediately. When the user info is available, re-render with the avatar information.

The nice thing about this is that the avatar information will immediately update in real-time as soon as someone updates their avatar. Yes, with a KV-store, you need to do more reads because you can't join data (which implicitly will do a read btw), but it doesn't seem like that big of a deal to me. Immediately reflecting changes to the public user state seems nicer than the convenience of a join.

I'm a longtime user of Google App Engine / Cloud Datastore / Firestore and wrote a Java ORM for that environment that has achieved some popularity. I really like the datastore for some applications, but there's a pretty good chance that it's a bad fit for you. While I'm only casually familiar with your problem domain, it seems at first glance like the kind of thing that would scale reasonably well in a traditional RDBMS. You could would get a lot of value from joins and aggregations, and you don't really have zillions of elements changing all at once.

I could probably give you some advice on how to use the datastore better (most of which would be along the lines of "don't denormalize, store foreign keys and use batch key fetches instead") but it might just be the wrong tool for the job. If you want to talk about it, contact info is on my profile.

Yeah - we do store foreign keys, but Firestore only supports fetching a batch of 10 keys at a time afaict. It might just be the wrong tool, like you said.

We do very much want real-time updating, but there are okay integrations for that with RBDMS's now (eg Supabase). Primarily, I'm curious about some of the newer/more modern DBs, and whether anyone has had good or bad experiences with them!

Supabase developer here. I came from Firestore to Supabase due to running into a lot of limitations you're seeing. Just my biased opinion, but looking at "newer/more modern DBs" is not necessarily the route you want to take. That's why I looked at Firestore and ended up at Supabase. PostgreSQL is not "newer/more modern" but it's time-tested, battle-tested, and I know thousands of companies have used it in production for decades. I prefer to go with something I know works, will work at scale, and has tons of community and commercial support. FWIW
> developer burden imposed by needing to denormalize information.

> Then the other pain point is the "joins" use case;

We usually do that client side, with the aid of a web-component holding a ref to the (realtime db, not firestore) database path, and rendering its value. The payload is small as you only fetch data you use.

That works pretty well, even with long lists or grids; quotas/price on the realtime db are pretty generous.

What initially motivated you to use Firestore?