Hacker News new | ask | show | jobs
by keepamovin 1045 days ago
I love it! But as human processes go, it will need to surmount the "flaky tests" problem of, "let's just turn off this test because it's flaky and we need to merge this branch". I guess that means FinOps teams will still have to fight to be heard, but I think you are helping shift a lot of their burden!

What remains seems more like organizational dynamics, but what are your thoughts?

2 comments

Great point - indeed FinOps teams consistently rank "empowering engineers to take action" as their number 1 challenge (https://data.finops.org) - and by that they mean the human and organization dynamics of the culture change they want to create across the org.

The testing analogy is a good one as this feature also shows the engineers the current "failing policies" on the main branch too, so whilst they could merge the pull request without fixing the tagging issue, it'll just get added to the list. And maybe like tests, they group them into one task and go through to fix them all every so often to get the main branch back to green!

Nice! What did you start out doing, if you don't mind me asking? And how did you come to this, pivot, if that's what it is?
We started out with the Infracost CLI showing engineers cost estimates in the terminal before they deployed their code. The learning was that it also makes sense to check for other things like tagging policy issues and best practices not being followed as these things are more actionable than showing engineers a cost estimate. The cost estimate is actually more useful to trigger notifications on, e.g. if an engineer is adding $10K worth of databases, let the engineering management or FinOps teams know so they're not surprised by the spike in the bill and can adjust budgets if needed.
Cool! BTW do you know a great dashboard where I can compare VPS costs across all providers?
For anyone following at home, once you've identified a test as flaky, your next action should be to turn it off. Nothing good comes from keeping flaky tests around. Detect them as soon as you can and either fix them _right there_ or skip them.

I've used this in practice in a company of ~80 developers at the time, applied it because read about it in some Dropbox papers, and have since seen it work in 2 other companies. Skip your flaky tests!!

I suppose the difference between flaky tests and typos in tags/missing tags is that the latter is less about flaky-ness, and more about the engineer deciding not to fix the tagging issue and merging anyway. In Terraform, tags are fairly easy to fix and don't require the resource to be recreated so it feels like it should be a quicker fix then fixing/refactoring tests.

I think the easier we make it for engineers to fix tagging issues, the more likely it'll be for engineers to take action. Send me an email asking me to read the company's wiki page on tagging policy and I'll delete the email; tell me I have a typo on line 8 as soon as I open my pull request, I'll fix it and move on.