| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mattlord 1482 days ago

Installing the time zone tables on a single instance is certainly not hard: https://dev.mysql.com/doc/refman/8.0/en/time-zone-support.ht...

The trickier part is orchestrating the ongoing management of that across a large dynamic fleet. And in this case, it was much more than simply loading the tables but about using them to support importing databases into PlanetScale: https://github.com/vitessio/vitess/pull/10102

I'll link to my other comment on the billing issue: https://news.ycombinator.com/item?id=31509240

We've had to do some other changes to our MySQL fork as well that will show up there, but we'd love to not have any patches! We'd love to keep the patch set minimal (just as Amazon certainly does with RDS and Aurora). And I would certainly argue that Vitess, which is what we build PlanetScale around, is a meaningful piece of technology that pairs with MySQL to make a great database: https://vitess.io. You're of course free to disagree — and I wish you all the best as you work to build something great in the future.

1 comments

throwusawayus 1482 days ago

what other managed sql DBs charge based on rows read, regardless of whether they are on-disk or in-memory? honest question. i am familiar with a number of managed mysql and postgres products, and none of them bill this way that i have ever seen

and for the record, despite planetscale staffers repeatedly denigrating rds (your competitor) on hn, aurora’s patch set is not “minimal”

i do think vitess is cool for what its worth. i just think your managed db product has bananas billing and also is horrendously over hyped, and your ceo’s responses to criticism are very reminiscant of theranos or wework’s responses to same

link

mattlord 1482 days ago

I doubt that anyone would claim their billing metrics are perfect. If you find some specific workload that's actually cheaper on another serverless database offering then we'd love to hear about it (we strive for transparent, generous pricing). If you don't think that CPU usage based pricing — which is typical for serverless offerings and e.g. is what Aurora serverless uses in Aurora Capacity Units (ACUs) — is charging you for reads of cached data then I've got some bad news for you. :-) You're almost certainly being charged for reading the "row" from the network, write-ahead-logging for it and other ACID/MVCC related overhead, writing it to block device, reading it from the block device, reading it from memory, writing it to memory, sorting and comparing [pieces of] it, and writing it back to the network — all of these things take CPU cycles. I find this argument to be entirely missing the point.

Pointing out that surely Amazon would like to keep their patch set to a minimum (there's a high cost in maintaining custom patches as you upgrade MySQL) is in no way implying that their patch set is small. Minimal means the minimum required for what you need, rather than being some point of pride.

I'm certainly not on here bashing any other offerings. Between the two of us, I only see one person trolling / bashing. :-) With that, I will leave you to your opinions which you are of course free to have. Best of luck.

link

throwusawayus 1482 days ago

aurora serverless pricing is not based on cpu cycles. this is just not how ACUs actually work or scale or are priced, at all man

anyway i gather the answer to my question is that no, there are no other examples of managed sql dbs that bill the way you do. my complaint is this is inherently not transparent because it violates user expectations. users try comparing to io based provders and fail to understand the pricing math comparison (on io pricing 1 read = many rows) or caching implications (on io pricing, cached rows dont count as io)

as for denigrating rds, look to your ceos past hn comments. i would link to it, but last time i did that i got flagged, despite it being a recent thread that i was directly participating in

link

mattlord 1482 days ago

It's fairly difficult to find actual details on ACUs and how it all works, the best I found after spending significant time looking was things like: https://www.jeremydaly.com/aurora-serverless-the-good-the-ba...

According to AWS you're paying for chunks of CPU and memory on a per second basis: https://aws.amazon.com/rds/aurora/faqs/

It's hard to imagine that the CPU capacity is measured in anything other than CPU cycles (time slices of physical capacity) — in the same way it's hard to imagine that the memory capacity is measured in anything but bytes. But whatever, I don't care. It's cool, good for them. The point was... you don't think you're paying for reads of records that are cached? I give up, I fail to see how this can really be a good faith discussion.

I don't know how all other serverless database offerings do pricing. What difference does it make? They're all different. As a user, you want it to be based on your usage and to be fairly and reasonably priced while also being easily audited and predictable. Those are the key properties I would care about.

I honestly cannot see how you could be missing the point by this much and still be operating in good faith so I'll for real, for real stop. :-)

link

throwusawayus 1482 days ago

you just are not understanding my point, that does not mean i am acting in bad faith! jeez

i originally said pricing for other managed sql dbs, not specifically “serverless” ones. we both know that is just a marketing term anyway

with ACUs the point is you configure min and max, and your cluster scales up/down based on a cpu utilization threshold. so, sure reading from memory uses cpu cycles — but a large cached read is incredibly unlikely to bump you over a scaling threshold which affects your bill, unless you’re doing some huge heavy sort operations

another key point is aurora serverless v2 does not scale down to 0 acu. you are always paying a predictable small amount for your base cpu and ram. minor increases in cpu usage literally do not impact your bill at all, which is why i do not believe your argument makes sense regarding cached reads.

edit to add: the reason this matters for monetary cost of ELT/ETL is it often involves very large reads. if your jobs only extract recent/changed data, this will very likely be in buffer pool, and cost way less with io pricing than with your row based pricing. clear?

link