Hacker News new | ask | show | jobs
by nawgszy 2177 days ago
Could you say more about some of these times? I'm curious what it means in more concrete terms.

I only say this because I am a UI engineer who has been in some scrappy situations, so obviously I don't get to change the data model. What kind of structures do you find that "hit the database too hard" or otherwise invoke performance penalties?

1 comments

When I say "hit the database too hard" what I actually mean is making too many calls to the database. If I have to make three or four calls just to get all the data I need, performance is going to take a hit. And databases are often the worst-performing part of the tech stack, so you have to craft your queries around the performance bottleneck of the database.

So you have to balance your table structure around getting the data you need in as few calls as possible while also only returning exactly as much data as you need and no more. Because pulling extra data out puts more load on the database and shipping more data across the network is slower.

In general, relational databases (RBDs) like Postgres or MySQL are great for making sure your data follows the correct structure (this field is a string, this is an integer, this field in table1 has to be the same as this field in table2, etc) but make it hard to make changes to the data structure later. What happens if that int should actually be a float? Now you have to write a migration that makes fundamental changes to the structure of the data and hope there are no negative side effects.

Someone else mentioned NoSQL databases, which offer a lot more flexibility at the cost of the data integrity that RDBs enforce. If you suddenly want to store a float instead of an int, go ahead. No one is stopping you, you just need to make sure your code is updated to handle the possibility of getting back an int or float (or coerce the value to the right type and pray).

Basically a database is integral to almost all applications but they're complex monsters with their own structure and rules and performance implications. If you're building an application you really have to know the data structure of the final product before you even _start_ configuring the database.

> but make it hard to make changes to the data structure later. What happens if that int should actually be a float? Now you have to write a migration that makes fundamental changes to the structure of the data and hope there are no negative side effects.

This is often stated, but it's not true. RDBMS make it easier to make changes to the data structure precisely because you only have to write migration and after that you are guaranteed to only ever get the data out of database in the new format. Which means your application code doesn't accumulate the cruft of dealing with two possible database data formats.

I strongly disagree that this pattern makes it easier to make changes. It makes it easier to use the data (because you know this data will always be this type), but changing the data is harder than if it did not enforce a type.

I'm not arguing that un-typed data in a DB is easy to use or a good idea, just that it's harder to change the structure of RDBMS if you didn't plan properly from the beginning.

Basically: I think data is hard and I wish there were better tools for interacting with it.

Well, this was maybe a bit entry level description.

Q: "databases are often the worst-performing part of the tech stack" - compared to what? nginx throughput? I find this to be a bit of a strange view, surely business logic is always the slowest part of the tech stack

Q: "What happens if that int should actually be a float?" - how often do you actually need to run migrations versus just extending the data? From my end, I have a small idempotent database schema-maintaining tool, and if I need a new column or a new table there's no need for a migration, and you know your whole stack will interact with the new schema or old schema identically assuming you set sane defaults etc. I've built a lot of medium-quality low-traffic apps so I'm yet to encounter a real-world case where a migration wasn't just bad planning

>I'm yet to encounter a real-world case where a migration wasn't just bad planning

That's basically the root of the issue. Poor planning in your code means you re-write some code. Poor planning in your database means you have to start restructuring data, and if it's already running in production you have to hope you don't accidentally corrupt production data. It's a lot harder to restore corrupted data in production than it is to roll back a code deployment. And the answer to the problem is obviously just spending more time thinking about the proper data structure, which is the entirety of my complaint: I want my data to fit my application, I don't want to have to write my application to fit my data. I don't want to have to think "does this field belong in the Users table or the Accounts table or the [insert table here]".

I'm not sure what you mean by "just extending the data"... if I'm writing a Rails app and I need to change an int to a float, the way I do that is by writing and executing a migration.

As for the speed... a database typically stores its data on disk and is often not hosted on the same physical machine as the web server. Meanwhile the app and web server store a lot of things in-memory on the local server and even when it has to read from disk, it's a local disk attached to that machine. Check these numbers for how long it takes to read from memory (or even local disk) versus reading a remote disk over the network: https://gist.github.com/jboner/2841832

> >If it's already running in production you have to hope you don't accidentally corrupt production data

Or you write use a read replica to transform your data into a non-live DB and validate it before you put it into production, with backups of your final old schema available? Plus I really don't think a migration is that hard. Much harder than having a litany of shitty backends you have to glue together in your front-end app, trust me as someone who's done both.

I mean, I hear you, persisting data is hard, but that's not the database's fault, it's because you pick two of three on data: performance, persistence, and flexibility

> a database typically

But that's not the "worst performing" part of the stack, that's the highest latency part of the stack. Is there some specific reason you can not have co-located web & db servers? Also, is there a reason you still reference 2012 disk numbers when SSDs clearly have reduced all "disk" operations by an order of magnitude?

> just extending the data

What I mean by this is if you start with a minimal amount of columns in your database, and someone is like "we need new property x", it's easy to add X outside a migration - adding new columns or new tables does not require migrations if you don't modify existing columns

So this is my approach. Use as few bits as possible to persist your data, and then you can generally add new features migration-free