Hacker News new | ask | show | jobs
by Scubabear68 1221 days ago
Maybe I am just an old fuddy duddy conservative, but this struck me from the post:

“In the grand scheme of things, one week isn’t that long. But to us, it felt like forever. We are constantly iterating and release multiple changes every day”.

I assume they mean multiple production releases? Is this because the product lacks maturity or stability, or is it just your culture?

I am asking because I am trying to imagine the impact of this on existing customers. It sounds like an awful lot of churn.

This obviously happens a lot in the “you are the product” space like Facebook, Google, etc. But this looks to be a data analytics product with paid tiers. Curious what tooling and processes you have to support this, and how you keep customers happy with this model.

8 comments

I think it’s that you are an old fuddy duddy :P

Actually, if you work with SMBs/enterprises, I agree with you on customer facing changes. In my past life we would ship very frequently (often more than once a day) but always had to feature flag changes that large clients might see or be affected by. Even something as simple as tweaking the layout of a core flow could cause support headaches and angry customers — customers worth 10s of thousands of dollars per month. Is it worth losing a customer to CD a new button placement?

I can only image how clean code looks & works that is full of feature flags. Glad that I don't need to do that to often :)
The right approach is to immediately remove the flags after rollout…

The actual approach is to maintain a million fucking feature flags, ensuring that almost all possible combinations are essentially untested… better hope you did a good job separating concerns!

They keep adding them until they need an internal library to manage their collection of feature flags.

Then after a while they graduate to feature flags as a service (of which there are a bunch of cloud services trying to have a go at)

This is considered the norm for high performing product teams in the modern day.

We keep customers happy because we push changes live incrementally, reduce our chances of major outages and improve our response time when they do occur.

As a customer, if I find a competitor that does not do this, then I will switch to it.

For example, I cancelled my netflix subscription because they are unable to reliably operate microservices, and the UI was always in some semi-broken state. As a software engineer, this stressed me out during my relaxing TV time.

Even if continuous delivery is somehow reliably delivered, if the changes are customer visible, then they break my muscle memory, and increase my cognitive load -- I have to re-learn the damned UI every fucking time I log in. If the changes are not customer visible, then what business value to they deliver?

And yet the numbers show that companies that continuously delivery updates massively outperform those that don’t. You claim to make decisions based on a company’s engineering practices but I can guarantee that you have no idea about the engineering practices of any companies that you do support actually are.
Wait, can I see these numbers grouped by industries if possible? Please?
Yes. Read Accelerate and read the State of DevOps reports from 2017 through 2022. The reports have the data and explain their methodologies for evaluating said data. It's all there for you to consume.
Oh, so these reports are just surveys.. I was hoping for some hard analysis/statistic..
Read any book on the subject, they’ve done the research for you. The Phoenix Project, The State of DevOps Report.

Spend some time in companies that move slow vs fast and you’ll see the difference in their success first hand. You’ll see the metrics on their incidents and severity and customer satisfaction with them.

Oh, and the fact two companies mentioned (Google and Facebook) are two of the most successful companies on earth.

See my comment elsewhere taking “people are the product” out of this conversation.

Facebook and Google and like companies do not care if they piss users off. They churn features constantly, break people’s flows on a regular basis, A-B test features so different people get different experiences.

They get away with this because the “users” aren’t users, they and their data are the product. You pay nothing to use their services, and you get what you pay for.

Look, you not convincing at all.

1. Should I skip books that do not confirm your point of view?

2. What if I already spent time in these companies and found no big difference?

3. So, why companies that use different approach still exists and even (mamma-mia) profitable?

This just sounds like Continuous Delivery. We never achieved it in my last job, so I can't speak from experience, but my understanding is that typically "deploy" is separated from "release" using feature flags of some kind.
The article starts with “Last year, we made the difficult decision to stop deploying any changes to production for one week” and goes on to talk about releases.

In that context I assume this means they make multiple production releases per day (which makes me shudder). I am curious how they do this while maintaining high quality and not driving customers insane.

Hey, Al from Tinybird here (co-author of the post). We've made up to 20 production releases per day some days. It's transparent to our users, they aren't even aware the upgrade is happening, there's no upgrade button to hit, there's no downtime. We release often because we release small and fast. It's not like those 20 releases are always fundamentally changing the product. We would rather fix a minor bug or two and get that out to our customers ASAP, than hold on to it for a few months and drop a huge change. In a vast majority of cases, a user won't even consciously notice something changed.

Doing this kind of fast iteration has its risks, but it has its benefits too. We de-risk it, in part, by having extensive CI, which is why it was so important to us that the CI is fast & reliable.

Delivering larger, less-frequent updates has its own risks. You're not practising your release process as frequently, so it's a much bigger event. You're pushing many, many more changes in one go, so there's a lot more surface area for something to go wrong, and rolling it back is a much bigger job. And dropping many/bigger changes to the user experience is much more noticeable.

Again, this isn't the right process for everyone, but it works for us and its how we've managed to build a product that delivers value to our users.

It's interesting how you differ between "deploy ASAP, within half a day" and "keep the fix a few months back".

Like, is there nothing in between? Like once a week, once every other week?

If you would have to estimate, wouldn't be there less bugs if you would deploy less fast (and use this time for validation)?

Sure :D

I've worked on products across the spectrum, Enterprise software that releases twice a year, smaller stuff that ships once a month, and now a SaaS that releases many times a day. I don't have the data to compare, but I genuinely don't believe the rate of bugs was materially different between any of them.

We make faster changes, but the changes are much smaller, and so the surface area to test is smaller. The more changes you make, the more time you need to validate. If you make a weeks worth of changes, you're going to need an appropriately longer validation cycle than if you make 1 change. It scales, and in my experience, many products with slower release cycles aren't appropriately scaling up the validation time to match.

That said, this doesn't necessarily mean we're writing code and 30 minutes later its in production...there's still an iterative dev cycle with lots of validation happening...but if something is ready to go, ship it!

Thanks for the response Al!

Very interesting. I agree different strokes for different folks, you guys seem to be on the extreme end of CI/CD.

Have you done any sort of analysis you could share on what it costs to release up to 20 times per day?

We do track the real $ cost of time & materials, but tbh I don't think it's anything too exciting. I'll see what we can share!
> multiple production releases per day

Most (but not all) SaaS businesses are expected to these days, so I'm curious what your business/industry is that not only that you don't, but that it gives you the shudders.

I consult across a few different industries, but it includes SaaS offerings. Many places could do multiple deploys to prod a day, but choose weekly releases or other longer cadences. This is to allow for documentation, client notifications, etc. It also is more efficient, constantly releasing requires a lot of resources, as this blog implies they are churning dozens of K8 pods several times a day.

I would think constant releasing would also make debugging prod issues pure hell.

It sounds like continuous deployment, not continuous delivery.

Continuous deployment deploys code to production frequently, as soon as it's ready.

Continuous delivery has some ready-to deliver branch that's constantly being updated as above, but they're not deployed to production until someone (Product Owner?) or something (Yay - end of sprint!) triggers it.

Different people may use the word release for at least this many things: 1) a deployment, 2) an unveiling via feature flags, 3) a public announcement despite the code already having been live.

Yep, it's part of our culture, we do many releases per day to constantly iterate things. Also as in other projects there are maintenance and bug fixing we want to bring to production as soon as possible.

Our context is the one of a startup that is constantly validating things, also in our context a release does not necessarily mean releasing to the users, sometimes stuff is behind feature flags or for beta testing.

Hey there! I'm an old fuddy duddy, too.

Continuous deployment has been around long enough that even IBM (remember never getting fired for buying IBM?) talks about it.

https://www.ibm.com/topics/continuous-deployment

"Dark deploys" and "feature flags" are often used to keep customers safe from incomplete features while still giving all of the advantages of CD plus allowing testing in production.

I'd never heard of Flagship, but this is a nice writeup on that (kudos, Flagship.io):

https://www.flagship.io/glossary/dark-launch/

Mh, I'm interacting with teams with wildly different release strategies and stability requirements at work.

One of the more fundamental things actually pushing towards faster releases is what I call the relativistic deployment speed. We have products that will need at least 2 months to get a remotely deployable version ready. The average fast hotfix usually takes more like 4 months until an installation on a prod system actually can start. Our fastest products can go from code to prod in like 15 minutes with the automated tests being the bottleneck.

This in turn shapes choices for the product managers, but also for security. If something like Log4shell hit these slow products, I'd have to plan to be vulnerable for two months at least, and usually more like 4 - 8 months depending on the customers. I have no choice, because that's their light speed of deployment. No code goes to prod faster than two months latency. That, quite frankly, fucking sucks.

Other products were much better in that situation. We were lucky to have the right devs around, but we went from the decision to emergency log4shell at an utmost risky speed to the first log4shell patches in prod of many within 30 minutes.

However, that's not the normal speed, and that's when you get into the second decision area. Given a lightspeed of deployment, how fast do you want to go?

Some of our possibly faster moving products are B2B products, with a lot of internal training for support and consulting going into a release, and also training at customers happening for larger customers. This means, product chooses to only release bigger changes and heavily customer-visible changes every 6 weeks. They could do this a lot faster, but they choose to slow down because it fits their customers well. And for example, december is usually frozen entirely because customers want to.

But then there is the third decision area. What happens if there is an entirely customer invisible change, such as an optimization in database handling, some internal metric generation for an optimization, or an internal change to prepare a new feature for the next scheduled rollout? And we have the tested, vetted and working option to just push that into prod without downtime, with also gives us opportunity to build experience with, and confidence into our no-downtime deployment system? I don't see a reason why I wouldn't exercise this daily at least once.

Read the State of DevOps reports over the years and you'll see why this is the direction we're all heading now. It turns out all that safety we thought we were building by making complex commit flows to multiple branches and environments was not only more complex than it needed to be but has also slowed us down and not made things better. Truly based development is back again and this time with data. Push early, push often, push small changes and iterate quickly. It's not just easier but it also seems to increase quality. (There are a lot of reasons why this turns out to be true. Read Accelerate. I won't do a better job explaining in a comment.)
That’s continuous delivery, right? You make great tests and you should feel comfortable releasing after review.
In my experience and to parent’s point, it’s not about your comfort it’s about documenting, notifying clients, updating support, etc. All the non-code parts of selling software. As you suggest if the code has been reviewed, tested and merged, it “should” be ready to go. Right?