Experiment, Measure, Repeat

Y	Hacker News new \| ask \| show \| jobs

	Experiment, Measure, Repeat (blog.buildo.io)
	65 points by ecaml 3002 days ago

4 comments

oftenwrong 3002 days ago

"avoid maintaining useless things"

If your company is small, heed this warning. If you're spending a lot of time on maintenance of non-essential things, and struggling with supporting legacy things while creating new things, you are missing a huge opportunity. Right now, "trimming the fat" is as easy as it will ever be. As your company grows, it will only become more difficult. Be ruthless now with maintaining focus on things that "move the needle", and with killing things that don't.

Analogy:

It's like you're gearing up for a long hike. There is a natural tendency to take things along "just in case", even if you know you probably won't need it. At the trailhead, you could easily leave some of that non-essential stuff in car. You try on your pack, and it doesn't feel that heavy. You think "what's the harm". Of course, you are still fresh and full of energy at the trailhead. A few days into your hike, you start to really feel the weight of that extra stuff. The pack digs into your shoulders and hips. Your entire lower body is sore. You regret bringing the extra stuff, but now that you are in the wilderness, you cannot just dump it. You have to carry it back with you, and you wish you had exercised restraint when you had the chance.

link

maxxxxx 3001 days ago

I am working more and more on this. We have a ton of legacy code we drag along because nobody understands it so it's too scary to touch. I have started to push the idea that it's simply not acceptable to have code we don't understand. We have now refactored several parts that were painfully complex and convoluted by analyzing what they really do and then rewriting or changing. Most of the time it is not that difficult once you commit to the task.

link

nnutter 3002 days ago

They measured long enough in this specific example that they avoided the problem but something I see people repeated forget is to establish a baseline before making a change. If you don't know how variable something is before you make your change you might naively think you made something better or worse when it's within normal variability.

link

ecaml 3002 days ago

Good point, and measuring things as they are before experimenting can also serve as a valid motivation to start an experiment.

link

cjf4 3001 days ago

This sounds an awful lot like the lean/six sigma (lss) tool set that every company of a certain size and age has experience with. But that's not to invalidate the ideas here, as lss can unfortunately be prone to a cultish zealotry that mutates the original principles.

link

mwexler 3001 days ago

Is it just me, or is this example not an experiment? There was just pre and post change measures, with no comparison group. The measure of success was "use more threads" which was the same as the treatment, instead of the actual goal, making an improved perception that a Slack channel was faster and easier to read (and potentially improving productivity: faster ship, more tests passing 1st time, etc.).

A better method might have been something like: Pick two channels with equal traffic and relevance to business, require one channel to emphasize threads and respect quiet, let the other go on as they are, compare groups at the end on the actual metric of concern (of users in each group, quick check/survey of perceived utility, ease, value of the channel). Could even have done same measure in the beginning as well to show change over time comparison. Still not random selection, but better.

"But that's a lot more work", some might say. But without this extra work, there is no actual experiment. The test just says that threads are good and when we ask folks to use them, they do. But look at the data; Slack message count decreased from Q4 17 to Q1 18; any change in actual utility could be due to seasonality, shipping vs. bug bashing, other changes that resulted in fewer messages so threading wasn't needed; maybe threading caused fewer messages, maybe threading caused people to choose not to comment when they would have otherwise... but we can't tell from this design.

I'm not saying there's anything wrong with iterating. But call it that: "Iterate and change, measure, repeat if change correlated with goodness". A formal Experiment is designed to show that the change you made _caused_ a change in something else, something important to you or your business. Without a formal experiment, you just have correlation, hope, tribal knowledge, instinct, experience... all great things, none of which support causation.

And not everything needs this level of rigor, and that's totally fine; maybe that's the case in top post example. But if a change is expensive in terms of workflow, effort, or actual expense, perhaps it's worth doing a more structured test before committing.

If you've never had an experimental design experience, try reading anything at http://exp-platform.com/ (Kohavi's work at Microsoft) or search for DOE design of experiments at your favorite search engine; also articles on "A/B Testing" often give suggestions on how to best structure a controlled experiment. And recognize that most older work focuses on traditional ANOVA and t-tests, but there are all sorts of other modern ways to assess impact.

(edit: corrected typos)

link

ecaml 3001 days ago

Thanks for the advice!

You're right: scientifically/formally speaking this cannot be considered an experiment but, in my opinion, for small and easy-to-change processes, there is no need to be formal. The goal sometimes is just to solve small pain points and make people happier at work, and in such cases the fact that people perceive that things are better after the experiment could be enough :) Obviously, this was just a simple case study and every perceived improvement could be due to confounding factors, as you say, or just to the fact that the experiment created awareness about a problem. Obviously, if an experiment touches something more relevant than slack messages, being more formal is a good thing and A/B testing is for sure a better approach.

The point of the post was just to make people aware of the fact that changes in a team can happen without too much pain and that continuous improvement and experimentation are processes that can be implemented easily.

link

rich_ard 3001 days ago

Did you measure, though, that the staff was happier or more productive? Did any meaningful change occur as a result of the requests to use Slack differently?

Looking at your graph, it's not really clear that anything has changed significantly: when did the change get implemented? Is a couple weeks' change in activity a result of a couple of loud employees going on leave? There's such variability there that it's hard to say that anything has really changed.

I agree that a formal analysis isn't always necessary, but you don't appear to done _any_ here and are then couching a suggestion to try something on Slack as having measured something. That's just bad science! :D

link

ecaml 3001 days ago

That's not science :)

However, you're right: in the article, it's not clear how we decided to consider the experiment as successful: we called a meeting with all the team and asked everyone something like: "Do you think that the situation on Slack is better now?", and the majority of people said "yes".

I added a note at the end of the article to address some of the comments ;)

link