Hacker News new | ask | show | jobs
by felixr 1571 days ago
Worth noting "rows read" is not "rows returned". If you do 1000 full table scans on table with a million rows, you got a billion reads

https://docs.planetscale.com/concepts/billing#understanding-...

3 comments

I still don't see how this pricing model is usable in a database with a query planner. The user is not in control over the query plan, the database is. This is a recipe for disaster, and I'd never feel comfortable with this. Even with good understanding of how the database works there is pretty much no way to ensure that the database doesn't do some suprising full table scans.
Many young founders won't have the experience or the patience to really understand this, and in practice most won't have the scale to really feel the pain anyway. The ones that do will be worth so much they can negotiate custom contracts. Such is the market for pickaxes created by easy VC money.
My big problem with SQL is that I want to do the query planning. Sure, the database can often do a better job -- but I want to be able to guarantee upper bounds on things, damn it! I've seen so many weird edge-case performance problems that boiled down to poor query plans. Things will be fast most of the time and then, suddenly, the database will look at its stats and decide to do a full table scan, and then everything goes straight to hell and stays there for too many milliseconds.

If I ever cross paths with a database, I'll spit on its shadow.

True, even with tuned indices table scans are pretty frequent and perhaps one would want a slower but cheaper query. It’d be quite stressful to have to worry about query plans executed because they can vary quite a bit across very similar queries and be nudged towards more expensive queries by development teams.
If your application allows somewhat flexible queries this might also create potential Denial of Money attacks, which I personally find scarier than plain old Denial of Service. I mean that's always a risk with this type of pricing, but the particulars of a relational database with a query planner makes this a lot more dangerous.
This is why Google Cloud Firestore/Data store charges for rows returned, but the caveat is you can't do stuff like aggregations as the cost just doesn't work out
This comment has to be higher up. This pricing is like a sword over the neck and drops when you screwup or the sql planner screws up.
DynamoDB is the same, it charges you for the number of reads/writes you do, so if you're doing full table scans on massive databases, you're going to have a bad time.
With the caveat that ddb extensively documents how you will get billed, down to the request size.

I'm all for the aws hate when it's deserved, but if you get screwed by it on billing, you didn't read.

There is a difference though - you don’t have a query language that makes it easy to do, and the actual technology pushes you to make a different more correct choice.
First of all, I agree there are gotchas. We have roadmap items that will eliminate this problem in the medium term.

I do however disagree that this is any worse than idle RDS hosts sitting around when you have no traffic, costing you huge sums for a service that is basically `apt-get install mysql-server` on top of EC2.

> RDS hosts sitting around when you have no traffic, costing you huge sums for a service that is basically `apt-get install mysql-server` on top of EC2.

rds gives you automatic backups, automatic failover with DNS, easy upgrades, IAM authentication, an API for manipulating your database instances, and more.

As ceo it damages your company's credibility when you say it's just `apt-get install mysql-server`. Please do better.

Strongly agree. There is a huge amount of value in what RDS provides (over DIY EC2).

Fellow HN user quinnypig described it succinctly:

  RDS is a huge win not because of anything intrinsic to what the platform actually is, but because we collectively suck at setting up and managing replication, backups, etc.
https://twitter.com/QuinnyPig/status/1173377290815721473
One of the things about a $29/mo RDS instance, though, is that if you're doing full table scans over a million rows, it's going to grind to a crawl and immediately alert you (not explicitly, but via the performance hit) that you're doing something wrong with indexing. Effectively, it's a hard budget cap, and that's super useful for budget-conscious organizations and individuals.

Does PlanetScale have functionality to provide budget alerts? Does it have the ability to speed-throttle calls that would require a budget increase to do effectively without further optimization, which effectively that CPU-capped RDS instance does de facto?

In other words, can I tell PlanetScale to cap my usage and throttle query speed if I exceed row scan limits, rather than blocking me entirely or charging my credit card more? If it doesn't yet have those capabilities, then I think it's fair to say it can easily be worse than an idle RDS host sitting around.

I have built systems for deploying both Mysql and Postgres setups with backups and replication and failover, and while it's simple enough to do, describing RDS as "apt-get install mysql-server" is a gross oversimplification.

I'd roll my own again for many reasons, not least because AWS is ridiculously overpriced, but if you're first using EC2 and so already taking the costs of being on AWS I'd recommend people use RDS over rolling their own any day unless they're very familiar with what proper redundancy and backup strategies entails .

A high but predictable cost is completely different from an unpredictable cost risk wise.

Hopefully you are not running a company nor anything that involves risk taking.

he is the CEO lol
That is wasteful too, but deterministically wasteful. You know exactly how wasteful you are being and how much it will cost.. no surprises.
After that comment, I hope you have a CTO who keeps you away from the engineering decisions.
This limitation prohibits the use of joins until a session parameter like 'set max_logical_reads to xxx' is available.
They're not alone in this approach. BigQuery and DynamoDB also meter data usage based on the amount of data processed during a query.
BigQuery allows you to specify maximum billed bytes for a query to avoid situations like this. You can also purchase slots for fixed cost unlimited queries.
Authzed[0] also does somewhat similar with SpiceDB[1], but charges based on query complexity for the levels of nesting in the graph traversal, rather than the actual number of rows affect. A flat query is easy to compute, thus cheap.

[0]: https://authzed.com/pricing

[1]: https://github.com/authzed/spicedb

One difference with DynamoDB is that there's no query planner, so you can have a pretty good sense of how many items you'll hit and how big that read is.
Athena as well