Hacker News new | ask | show | jobs
by mwexler 3002 days ago
Is it just me, or is this example not an experiment? There was just pre and post change measures, with no comparison group. The measure of success was "use more threads" which was the same as the treatment, instead of the actual goal, making an improved perception that a Slack channel was faster and easier to read (and potentially improving productivity: faster ship, more tests passing 1st time, etc.).

A better method might have been something like: Pick two channels with equal traffic and relevance to business, require one channel to emphasize threads and respect quiet, let the other go on as they are, compare groups at the end on the actual metric of concern (of users in each group, quick check/survey of perceived utility, ease, value of the channel). Could even have done same measure in the beginning as well to show change over time comparison. Still not random selection, but better.

"But that's a lot more work", some might say. But without this extra work, there is no actual experiment. The test just says that threads are good and when we ask folks to use them, they do. But look at the data; Slack message count decreased from Q4 17 to Q1 18; any change in actual utility could be due to seasonality, shipping vs. bug bashing, other changes that resulted in fewer messages so threading wasn't needed; maybe threading caused fewer messages, maybe threading caused people to choose not to comment when they would have otherwise... but we can't tell from this design.

I'm not saying there's anything wrong with iterating. But call it that: "Iterate and change, measure, repeat if change correlated with goodness". A formal Experiment is designed to show that the change you made _caused_ a change in something else, something important to you or your business. Without a formal experiment, you just have correlation, hope, tribal knowledge, instinct, experience... all great things, none of which support causation.

And not everything needs this level of rigor, and that's totally fine; maybe that's the case in top post example. But if a change is expensive in terms of workflow, effort, or actual expense, perhaps it's worth doing a more structured test before committing.

If you've never had an experimental design experience, try reading anything at http://exp-platform.com/ (Kohavi's work at Microsoft) or search for DOE design of experiments at your favorite search engine; also articles on "A/B Testing" often give suggestions on how to best structure a controlled experiment. And recognize that most older work focuses on traditional ANOVA and t-tests, but there are all sorts of other modern ways to assess impact.

(edit: corrected typos)

1 comments

Thanks for the advice!

You're right: scientifically/formally speaking this cannot be considered an experiment but, in my opinion, for small and easy-to-change processes, there is no need to be formal. The goal sometimes is just to solve small pain points and make people happier at work, and in such cases the fact that people perceive that things are better after the experiment could be enough :) Obviously, this was just a simple case study and every perceived improvement could be due to confounding factors, as you say, or just to the fact that the experiment created awareness about a problem. Obviously, if an experiment touches something more relevant than slack messages, being more formal is a good thing and A/B testing is for sure a better approach.

The point of the post was just to make people aware of the fact that changes in a team can happen without too much pain and that continuous improvement and experimentation are processes that can be implemented easily.

Did you measure, though, that the staff was happier or more productive? Did any meaningful change occur as a result of the requests to use Slack differently?

Looking at your graph, it's not really clear that anything has changed significantly: when did the change get implemented? Is a couple weeks' change in activity a result of a couple of loud employees going on leave? There's such variability there that it's hard to say that anything has really changed.

I agree that a formal analysis isn't always necessary, but you don't appear to done _any_ here and are then couching a suggestion to try something on Slack as having measured something. That's just bad science! :D

That's not science :)

However, you're right: in the article, it's not clear how we decided to consider the experiment as successful: we called a meeting with all the team and asked everyone something like: "Do you think that the situation on Slack is better now?", and the majority of people said "yes".

I added a note at the end of the article to address some of the comments ;)