Hacker News new | ask | show | jobs
by getcrunk 1975 days ago
As a team of just me with not much savings, trying to make something that may grow quickly (1-10s of TBs), I cant afford to scale up from the get-go. I need to be able to scale out first. It doesn't seem Postgres is conducive to that. AFAIK clustering or sharing is not yet offered by the core.

In addition, I need to look more deeply into postgres but its considerations for things such as lazy replication (unreliable connections or offline/mobile users) don't seem any less complicated than other options.

3 comments

You can scale PG horizontally with read replicas. Many frameworks support this (rails, for example: https://dev.to/schwad/how-to-use-horizontal-sharding-in-rail... )

I don't know your business but scaling to 1TB of data (not just files, but actual rows in a database) seems weird. Feel free to prove me wrong (I'm just some person on the internet, after all), but I've seen many startups fool around with over architecting instead of building features users want.

If, on the other hand, you're positive you're going to need to handle that kind of data from the start, I'd read up on real life architectures that are similar to yours: http://highscalability.squarespace.com/blog/category/example

This advice applies IMHO. https://www.cybertec-postgresql.com/en/postgres-scaling-advi...

"So, you’re building the next unicorn startup and are thinking feverishly about a future-proof PostgreSQL architecture to house your bytes? My advice here, having seen dozens of hopelessly over-engineered / oversized solutions as a database consultant over the last 5 years, is short and blunt: Don’t overthink, and keep it simple on the database side! Instead of getting fancy with the database, focus on your application. Turn your microscope to the database only when the need actually arises, m’kay! When that day comes, first of all, try all the common vertical scale-up approaches and tricks. Try to avoid using derivative Postgres products, or employing distributed approaches, or home-brewed sharding at all costs – until you have, say, less than 1 year of breathing room available."

I just read the original article my self before seeing your post. The last bit about "sharding" by partitioning your db from the start helped alleviate a lot of my concern I think. All my apps data doesn't actually need to be in the same DB for 90% of my use case (userid in range -> db x).
You may be able to setup a multi tenant db. I've done this in the past. A new schema was created for each tenant. It worked well. It never got to the size where it had to scale out. So not sure how easy it would have been to move to multiple databases.
If your business grows so quickly that one Postgress instance can't cope, by that time hopefully you should have enough revenue (or investors) to add a few read replicas.

Also, are you planning on storing large blobs in the database? If so, consider storing them in a blob store (e.g. S3) and only storing their path in the DB.

think something like google keep or trello or airtable. Its just a lot of text/json. Its in a db and not blobs or object store due to sync, filtering, tagging, reporting ..... larger assets(img) will be in an object store

Maybe its not supposed to be in a DB, but using my export of google keep as a guide on average my larger files are 10-20kb of json. x that by 1000 for unlimited* storage and x that per user you're over 1 TB.

Even in that case postgres will still work and there will be enough time to scale up manually. But from the perspective of architecting something on this scale or beyond I made my original post.

And so maybe I am over architecting then both of yours advice about pg still applies. (you guys are ultimately right, practically)