Would love to get some detailed, in-depth, and technically rich answers from people who upvoted this nonsensical article. It's literally a 100-word advertisement for a paid product.
I dunno. As a web dev, I find these "latest feature from vendor X" (typically Cloudflare or Vercel) posts quite helpful.
They let me know what the current big companies are doing to improve developer experience and/or devops. Their teams are often bigger than my entire company, so learning from them is a great way to see what we might be able to implement (whether by buying their service or doing a small scale version of it in-house).
Secondarily, they sometimes also show end-user experience improvements that I never thought would be possible at a certain scale/cost. Cloudflare hosting Doom and IRC on serverless is one such example, or Vercel doing multiplayer live editing (Figma style) out of the box with Next.js is another.
For this case of this particular post, the idea of having an external metadata analytics system for Mysql, stored in a separate store and dashboard, is something I've only vaguely seen in DigitalOcean. Seeing it as fully featured as in Planetscale makes me both wonder if we should consider using them instead of a some generic cloud DB, and also whether we can implement something similar on a standard LEMP container. It's one of the more interesting pieces I read on HN today, actually.
For those of us who don't live and breathe infrastructure, big(ger) tech company blog posts is a great way to learn, and I value them!
I'm not sure why you felt like you need to perform mental gymnastics over my comment. I said what I said because it's a valid point.
Let me help you understand it a little bit more in-depth.
First of all, the title used for this submission does not imply an announcement of a new feature. This one[0], however, does. Now, let's look past that. Let's say that the submission warrants front-page treatment fair and square because it has valuable content (a write-up at the very least). Lo and behold - it does not.
This submission[1] announcing Vercel's Build Output API has both a decent submission title (I can instantly tell it's a feature announcement) and fairly decent explanation for the feature, also. In fact, they even provided links to various documentations to help developers get started right away.
I'm still eagerly awaiting for someone to tell me what they learned from this specific submission.
I don't know what mental gymnastics you're talking about? I thought I was literally answering your question... I didn't know that rich metadata analytics for mysql existed in an easy to use GUI from this or any vendor. That's valuable knowledge I just learned from the post. My Cloudflare and Vercel examples were similar.
Now, if your issue was with the article title, I fully agree. I wish the titles here were clearer.
I didn't upvote it, but I think the product does inspire discussion about why we don't have basic important features like mandatory "last accessed" metadata on external state sources. Think of the number of hours of pain due to someone being "sure" a database/api/resource wasn't used, and then removing it. I'm not really put off by the advert for that reason.
Yes, the sensible default for this feature would be opt-out, like noatime, in case you need the performance boost and understand the implications of the choice.
The most generous hypothesis would be that Planetscale is an increasing popular product among the HN crowd - specifically, those who have to maintain large MySQL deployments. It seems like a very good product that is seeing something like viral adoption across those companies.
The next ingredient would be if dropping MySQL tables is a relatively big pain (or known but unmanaged risk) that most of those people have experienced at some point.
Then this feature would be exciting news for that not-so-small cohort and that seems sufficient for HN upvotes?
I don't know if that's what happened today. I've heard about planetscale a few times but never used it. I can attest that dropping tables is always scary.
I know of many great products that a lot of HN users use or have used in the past. Not once have I felt the need to submit their announcement for a new product feature because I feel like it would spark discussion or foster genuine inspiration.
Don’t get me wrong but clicking on this link I was expecting an actual detailed guide/explanation.
There is nothing anyone can say to justify this being on the front page considering both the submission title and the actual content of the said submission.
Isn't the safe way to do this is to rename the table and watch for errors before performing the drop? Queries fail on the old table name and can easily be recovered by reverting the name change.
If you happen to still be using it then you'd get a lot of errors right away though. If you are extremely paranoid you can do better by migrating to a new user that doesn't have access to the table. Switching to the new user would be a normal rolling deploy, so you'd only start with 1/N errors, and your normal rollback process gets automatically triggered.
Oh, that's clever. I have so rarely utilized the user management system in MySQL beyond just creating a user with the necessary permissions for each application. I hadn't ever thought of using new users to do canary deployments.
OP's solution is objectively worse, because it only reports access recency. Just because the table was or wasn't used recently, doesn't mean its safe to drop.
The code change may of landed the day before that disabled accessing that table, but the tool would warn that the table was still in use.
A quarterly cron job may access the table and go undetected if you look purely at the recency.
The rename strategy protects the data and offers easy rollback. This is a false heuristic.
Little Bobby Tables would like to have a word with you...
I honestly wonder if you could make a functioning database API that either does not accept strings as arguments, or can detect string concatenation and reject it. Not just a builder pattern to greatly discourage it, but a straight up exception on bad input type. Bind variables or GTFO.
The book "Building Secure and Reliable Systems" from Google's series on SRE actually talks about two examples of this in C++ and Go which forbid using anything but string literals in the query string of an SQL API.
In Go, the solution was very tidy: it aliases string to an unexported internal type that consumers cannot instantiate. String literals can be coerced to that type, but variables that already have type information associated with them are rejected at compile time.
The C++ solution was a bit more complicated and involved templates.
You can probably do it trivially in any language with operator overloading or a very expressive type system
However you would also probably want a language with some syntactic sugar that let users use your special string type easily otherwise the burden on users will be too high
> However you would also probably want a language with some syntactic sugar that let users use your special string type easily otherwise the burden on users will be too high
In the instance under discussion in that section of the book I'm referencing, allowing the user to instantiate the special string type was precisely the behavior intended to avoid. The idea being that in the following three examples, the first succeeds and the rest fail at compile time:
Query("SELECT * FROM Users WHERE id = ?;")
id := "6"; Query("SELECT * FROM Users WHERE id = " + id + ";")
id := 6; Query(fmt.Sprintf("SELECT * FROM Users WHERE id = %d;", id))
Sure you can. Just make your transport protocol only support taking in a stored procedure name and parameters for DMLs, and some typed representation for DDLs.
But while that prevents people from concatenating strings to form DML queries as a whole, it obviously doesn't prevent the kind of concatenation wereHamster mentioned.
I would argue the opposite, actually. The storage model or database layer should be in a single repo. And thus there would be less code to grep through.
That doesn't work because that repo needs to expose an API that accesses that table, now you have to prove the API is never called, and you're back to searching the entire code base for that function call (hopefully you've used meaningful function names instead of DRY ones...)
Add: log.info("get_user_by_user_id has been deprecated, please use the alternative API.")
Into the repo method, push up the change, redeploy the projects that depend on the library and wait a while and grep the logs to see if it ever gets used.
Dude, that just tells you if the table is used on the happy path. It does not tell you if it's only used by a customer whose godson is the CEO's favorite kid. Or if customers only use it right before a federal deadline for filing forms, or at the end of the quarter when it's time to generate reports. Which you didn't collect for 2 months.
Interesting feature, I think we can learn from this in our product.
IMO, it is still not safe even we know there are no queries running for this table. You may still meet a scenario that when you type `drop table`, another guy begins to run the query at the same time.
As the maintainer of another database, We have been trying our best to improve this scenario too.
Early on, we have provided a feature called `recover table` to recover table immediately after you wrongly drop the table. But this still can't avoid affecting current running queries on this table.
We call this problem `DDL affects DML`, and now we try to introduce a Table meta lock to guarantee that no any query is running when the DDL executed. We hope we can release this feature in the end of this year.
This is where the value of documentation comes in, as well as some process around your data model. If you need this to keep you safe you are doing something terribly wrong somewhere.