Hacker News new | ask | show | jobs
by lenkite 1589 days ago
sigh. My team is facing all these issues. Drowning in data. Crazy S3 bill spikes. And not just S3 - Azure, GCP, Alibaba, etc since we are a multi-cloud product.

Earlier, we couldn't even figure out lifecycle policies to expire objects since naturally every PM had a different opinion on the data lifecycle. So it was old-fashioned cleanup jobs that were scheduled and triggered when a byzantine set of conditions were met. Sometimes they were never met - cue bill spike.

Thankfully, all the new data privacy & protection regulations are a life-saver. Now, we can blindly delete all associated data when a customer off-boards or trial expires or when data is no longer used for original purpose. Just tell the intransigent PM's that we are strictly following govt regulations.

4 comments

The data protection regulations really are so freeing, huh. It's amazing to be able to delete all this stuff without worrying about having to keep it forever.
In case of my previous employer it led to incredibly complicated encryption system. It took couple years to maybe implement in 10% of the system. Deleting any old data was rejected.
I wonder sometimes if it would help if we collectively watched more anti-hoarding shows, in order to see how the consultants convince their customers they can get rid of stuff.
humans started their first 300k years as nomads – storing was just impossible and decrufing happened by itself when moving along.

So maybe that's why we're not good at it yet.

Being a renter definitely kept me lighter for a long time.

When you have to box things up over and over you find that the physical and mental energy around keeping it aren’t adding up. I wonder if migrating from cloud to cloud would simulate this experience.

Being a renter just taught me to batch my $STUFF I/O to minimize read-writes to disk and maximize available low-latency space. ie. fill my bags to the brim with shit I didn't plan using whenever I'd go to my parents'.
Two space garbage collector in action right there. Maybe all things software need a "move it or lose it" impetus. Features in apps, old data, you name it. If you've gotta keep transferring/translating it, it would definitely pare things down.
Also hoarding digital data is far easier than real. I wish I could have grep on real space.
How is encryption compliant? I’ve implemented GDPR data infrastructures twice now, and as far as I’m aware, the only way to be compliant with encryption is when you throw the decryption key away.
Sometimes it might be a single field in a 1MB nested structure that you have to remove. So it gets encrypted when the whole structure gets stored and when the field is to be deleted you just throw away the key instead of modifying the entire 1MB just to remove a few kB.
If you're comparing gov't regulations to delete data to saving a few KB, then I think you're looking at this wrong.
It's few KB per-record. In practice when schemes like that are applied, it means "in total we can remove this key and not rewrite 10M rows across 3 data stores which itself would cost $$$ and make the database and incremental backups cry".
As mentioned, encrypt something and throw a way the key, often called "crypto shredding".
Ahh I see, and that way you can quickly “remove” a whole lot of data by just removing the key, which makes for cheap operations, and/or more flexible workflow (you can periodically compact the database and remove entries for which you have no key).

Is my understanding correct?

yes, but also its that a lot of the data these days ends up in pseudo-append-only stores (like s3/glacier, or many data warehouse products) where deletes/updates to old data are extremely expensive. Or just having to scan petabytes of cold stored data looking for a particular users records. Throwing away the key is instant and "free".
That doesn't sound like something jeff_vader was talking about, since "deleting any old data was rejected" and this is definitely a way of deleting stuff.
Yep, having everything disappear at 2 months max is a life-saver.

That "absolutely essential thing" isn't essential any more when there is a possible GDPR/CCPA violation with a significant fine just around the corner.

Just make sure you actually test your backups. Two months of unusable backups are just as useful as no backups.
Well, you should have done this before GDPR too, but reminding people to test backups is never too late and never too often.
now this is a spin i havent heard before.
As a sysadmin I really wish you had. SO MANY problems have come to my desk because some dude 3 years ago did not consider retention or rotation and now I have to figure out what to do with a 4TB .txt that is apparently important.
"You never know when you might need this info to debug" The developer says as their cronjob creates a 250MB csv file, and a few MB of debug logs per day, for the past few years. "Disk is cheap" they say.

As a sysadmin, I hate that too.

sometimes the data is just big...
Often a considerable portion of those logs are useless, trace level misclassified as info, kept for years for no reason.

You should keep a minimal set of logs necessary for audit, logs for errors which are actually errors, and logs for things which happen unexpectedly.

What people do keep are logs for everything which happens, almost all of which is never a surprise.

One needs to go through logs periodically and purge the logging code for every kind of message which doesn’t spark joy, I mean seem like it would ever be useful to know.

Find out how important it is with a `mv 4TB.txt 4TB.old` type of things. See how many people come screaming
Have you come up with a process, or an idea for a process to ensure this doesn't happen?

For instance when they create a provisioning request, are you able to set an extremely low threshold? When they say that won't do, the cost increases and their able to see/understand and start to care about the actual lifecycles of what they're creating?

Surely there is a way to project and monitor the cost of their resources over time, and deliver them an invoice on a regular basis? In other words something like a cost attribution model? That way when the bills start to increase dramatically overtime, pinpointing the heavy hitters becomes trivial, and when they come knocking on your door to "do something about it" you can just say "go talk to Bob".

I don't mean to sound like I'm trivializing the problem (honestly I can relate as I've gone through it myself), but I'd love to hear how anyone else has dealt with this issue effectively.

It comes down to monitoring, alerting, and followup. In other words, "good ops", which is lacking almost everywhere. Unfortunately that is always a moving target, with added complexity being that we're an external service provider and have limited authority in the client environment. Also, the sorts of companies that outsource their ops will also be willing to change providers multiple times, so it's often like trying to live in a library that has seen many generations of librarians each with their own ideas for how things ought to be organized.
You haven't heard it because it's not spin, it's from an engineer's point of view. That's not the view you hear in the news when it comes to these things.
HN seems like an odd place to assume that people only hear about things from the news and aren't engineers themselves.

i am a dev that has to deal with these regulations in my day to day. it is a pain, it is not freeing in any sense, and it makes my models worse.

granted, i think there are good reasons for it, but it does not make my life easier for sure.

Eh, Retention and Deletion are both pain for devs. Not having to care is the happy state.
Disclosure: I'm Co-Founder and CEO of a cloud cost company named https://www.vantage.sh/ - I also used to be on the product management team at AWS and DigitalOcean.

I'm not intentionally trying to shill but this is exactly why people choose to use Vantage. We give them a set of features for automating and understanding what they can do to manage and save on costs. We're also adding multi-cloud support (GCP is in early access, Azure is coming) to be a single pane of glass into cloud costs.

If anyone needs help on this stuff, I really love it. We have a generous free tier and free trial. We also have a Slack community of ~400 people nerding out on cloud costs.

https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fdocs.va...

I gave vantage.sh 5 minutes and did not see anything for S3 that is not already available from the built-in Cost Explorer, Storage Lens, Cost and Usage Reports, and taking 1 hour to study the docs https://docs.aws.amazon.com/AmazonS3/latest/userguide/Bucket...

Most "cloud optimisation" products want to tell you which EC2 instance type to use, but can't actually give actionable advice for S3. Happy to be corrected on this.

Saving people from learning how to use Cost Explorer, Storage Lens, Cost and Usage Reports - and then taking 1 hour to study documentation - sounds to me like a legitimate market opportunity.
Not really. Sometimes you actually have to understand things. If you're so concerned about your billing, someone on your team should probably invest a freaking hour to understand it. If that can't happen, you are just setting yourself up for failure.
I've been learning the ins and outs of the major 3 providers cloud billing setups for the last year, and I'm just getting started. This is not a 1 hour job, but you're right that someone in your team needs to understand it.
At my last job we had a team spend an entire quarter just to help visualize and properly track all of our AWS expenditures. It's a huge job.
its a lot more than an hour, in my experience
We are in process of updating the documentation because you're right that it needs more work. For the record, if you're doing everything on your own via Cost Explorer, Storage Lens and processing CUR you may be set. From what we hear, most folks do not want to deal with processing CUR (or even know what it is) and struggle with Cost Explorer.

Vantage automates everything you just mentioned to allow you to make quicker decisions. Here's a screenshot of what we do on a S3 Bucket basis: https://s3.amazonaws.com/assets.vantage.sh/www/s3_example.pn...

We'll profile storage classes and the number of objects, tell you the exact cost of turning on things like intelligent tiering and how much that will cost with specific potential savings. This is all done out of the box, automatically - and we profile List/Describe APIs to always discover newly created S3 Buckets.

From speaking with hundreds of customers, I can also assure you that at a certain scale, billing does not take an hour...there are entire teams built around this at larger companies.

Vantage is a seriously awesome product. We love it at PlanetScale. Obviously being a cloud product things can get pricy and so Vantage is essential.
I love vantage. Thank you for making it.
I work on a team the computes bills, shoot me a slack invite and perhaps I can offer insight.
Are you multi-cloud because your customers need you to be multi-cloud?
Yes, geographically diverse customers who prefer different cloud platforms.
I host stuff on AWS, but I am pretty sure that hosting on my own server or a server a IT service provider maintains is much cheaper.
Did you include maintenance, patching and machine upgrades? Cause likely it’s not.