Hacker News new | ask | show | jobs
by ericb 872 days ago
Very cool!

Feature request: I have really struggled with turning the thing costing me money off in AWS.

If, with the right master credentials, I could consistently and easily do that somehow, that'd be a 10x feature. If you made that use-case free, you'd get tons of installations from people who desperately need this in the top of your sales funnel.

edit: This used to say "in your app" and that wasn't quite what I want, so I changed that language, but jedberg's objections in the comments below were, and are, valid concerns with what I was stating and any implementation.

7 comments

We solved that exact problem with our open source tool Resoto. Specifically our "Defrag" module, which cleans up unused and expired resources:

https://resoto.com/defrag https://github.com/someengineering/resoto

The magic behind the clean up is Resoto's inventory graph - the graph captures the clean up steps for each individual AWS resource.

One of Resoto's users, D2iQ (now part of Nutanix), reduced their monthly cloud bill by ~78%, decreased from $561K to $122K per month. There's a step-by-step tutorial on our blog how they did it.

I don't mean to hijack Dashdive's thunder here though, congrats on the launch!

This is the first time I'm hearing about your company but I am already a huge fan of the tools you're building. I've had a need for most of them and next time I am met with those needs, I will use your tools. Please keep going!
That would be a security nightmare. You don't want to give such powerful credentials to anyone, much less a 3rd party.

But a good stopgap would be a feature to spit out an API command that someone could run (or a CloudFormation or TF file) where you can put your own credentials in and run it yourself.

If you could only turn things off then perhaps that is less than a nightmare in some settings.
The only way it would be effective is if that credential had broad abilities to destroy, and I wouldn't want such a credential to get stolen. It would be bad enough for your most trusted operator to have it honestly.

The best way to do it would be to run the delete with no access, see what permission errors you get, and then only give those permissions until you've successfully deleted the object.

The safest way (but obviously more work) to do one off work like this is start permissionless and slowly open up. There are tools that can help with this, extracting the permission errors and generating the files to update the permissions.

Perhaps give Flowpipe [1] a try? It provides "Pipelines for DevOps", including a library of AWS actions [2], that can be run on a schedule (e.g. daily) or a trigger (e.g. instance start) to do things like turn off, update or delete expensive assets. It can also be combined with Steampipe [3] and the queries from the AWS Thrifty mod [4] to do common queries. We'd love your feedback if you give it a spin!

1 - https://github.com/turbot/flowpipe 2 - https://github.com/turbot/flowpipe-mod-aws 3 - https://github.com/turbot/steampipe 4 - https://github.com/turbot/steampipe-mod-aws-thrifty

What you are saying might be problematic. This is a naive approach towards saving cost in the cloud. Companies who would pay big bucks for a platform like this, I would imagine they would optimize cost by managing commitments on Reserved Instance over 1 year/3 year or sometimes monthly via platforms like some non-aws platforms. Net-saving is much higher than just turning off/on instances.

Managing cost without impacting any infrastructure is the right way to do it in my opinion.

It's very easy to do with Cloudchipr (https://app.cloudchipr.com). It allows to sort and filter and by product family (Compute, DataTransfer, Storage, etc..), then go down by tree and identify the most expensive resources under the category.

The nice part here is that when you see an instance burning $3000 a month, you can click on it and see cost break down based on traffic, compute, etc...

But is there a way to stop that instance, though?
Sure thing! You can stop, delete, backup and delete. It supports all the actions that cloud provider supports.
The CNCF project, Cloud Custodian fits these sorts of use cases pretty well, and supports periodic or event based triggers. https://cloudcustodian.io/docs/aws/examples/
This is an interesting point, and we could definitely consider something like this. Where exactly do you run into problems?

For example, let's say you've figured out that a particular EC2 instance or database is too costly. What is the sticking point for you in turning it off? Is it that the resource has other essential functions unrelated to the cost spike? Or is it the identification of the exact resource that's the problem?

When we evolved to SSO and subaccounts with various roles/access, there were resources running/used by different accounts. I would see the instance in the costs. But then, I'd try and find that instance, started by another dev, and get access to shut if off. And even though I'm the main account owner--which to me, means I should be able to nuke whatever, since I pay the bills, I always had trouble getting to it, and getting the permissions for what I wanted.

I used Vantage, which helped me see the problem, but then taking action on it was traumatic.

The barriers are:

- who owns it?

- what service is it in (if it is logs, for example)?

- where is the screen it is on?

- how do I get the permissions to kill it?

Makes a lot of sense - in my opinion AWS doesn't do the best job with this. We had a similar problem with EKS where even as the root user I couldn't view cluster details (https://medium.com/@elliotgraebert/comparing-the-top-eight-m....).

I agree that this would be a great feature. To be honest, our product isn't currently focused on this sort of automatic management of resource lifecycle; we're much better at data collection. But thank you for flagging this! We'll definitely keep it in mind as we add support for compute services (right now we only support S3 and there's nothing to "spin down").

Edit: The part less related to permissions (how can I kill it) and more related to discoverability (which resource is it) is more adjacent to what we've already built and is something we can take a look at soon. Perhaps we can take a crack at the permissions aspect afterwards.