Hacker News new | ask | show | jobs
by Ozzie_osman 1425 days ago
I worked at a self-described "data-driven" company, and the analogy senior leadership liked to make was that the company was like a machine learning algorithm, using data (particularly A/B tests) to do "gradient descent" of the product into its optimal form.

My first take-away was that using data to make decisions is tremendously, tremendously powerful. A/B tests, in particular, can help determine causality and drive any key metric you want in the direction you want to. Short-term, it seems to work great.

Long-term, it fails. Being purely data-driven without good intuition and long-term bets (that can't be "proven" with data), and the product loses its soul. You can (and should) invest in metrics that are more indicative of the long-term. And you should use data to help guide and improve your intuition.

But data is not a substitute for good judgment, or for a deep understanding of your users and their problems, or of "where the puck is going". It's just a tool. It's a very powerful tool, but if it's your main or only tool, you will lose.

12 comments

I don't have a way to prove this, but I've long suspected that A/B testing might be one of the major culprits to the modern problem of software churn. ie, constantly-changing GUIs, features, and products. Some percentage of users will always have problems with an application or interface. Constantly chasing every user hurdle might sound like a good idea, but I do wonder if it's similar to listening to too much fan feedback in the case of video games, movies, etc. Implementing every fan suggestion will usually make a terrible game. Instead, the ideal is to figure out which fan suggestions to ignore, and which to listen to. Only some of them will work for a given application.

I wonder if A/B testing is a bit like that. You're suddenly listening to every single issue encountered by users, but lack the wisdom to understand which issues to ignore. As a result, software changes constantly, users cannot really learn a GUI because it changes so frequently, and software teams feel like they're constantly improving things, however the software itself is not actually getting better when a the whole user's experience is taken into consideration.

The churn is the result of the fact that we don't understand what is optimal.

We know how to optimize for things like the shortest distance between two points but we don't understand how to optimize Software Design or even GUI.

This is literally what "Design" means. If something needs to be designed it means there's no theory or notion behind it on what it means to be "optimal." So we "design" a "better" solution but we don't actually know if it's better. So we design another solution and the cycle continues.

A/B testing when done on a population that gives consistent answers should converge on a consistent solution. If the population gives different answers at different times then of course there will be churn.

I would say the methodology of A/B testing is indeed like machine learning, it's quite good and accurate. It's the data source that's the problem. If you have users that don't know what they want or behave differently and inconsistently then your conclusions reflect the data. Perhaps the data is accurate and there is no consistent conclusion, OR the data is inaccurate and you need to get it from a better source other then users telling you what they think is better.

Honestly, this comes off more as a model problem than a data problem.

Here are some dubious assumptions:

* There is an optimal answer.

* Human preferences are transitive (if I prefer A over B and B over C, then I prefer A over C).

* Human preferences are internally consistent.

* Human preferences are stable over time.

Apply these assumptions to finding an answer to the question of what you should make for dinner, and you'll quickly see there are problems.

> Apply these assumptions to finding an answer to the

> question of what you should make for dinner, and you'll

> quickly see there are problems.

And, to amplify your point a bit, there are problems with A/B testing for which of my children I should love the most, or what I find beautiful or interesting; or should I A/B test out whether enslaving others works out well for me?

I chose extreme examples, but the point is that there are many, many things in human experience that don't lend themselves to easy, simplistic rules. It's genuinly hard to work through a lot of a lot of the issues that people face in real life.

With that said, it's it's also important to work through and to understand objective data as best as possible. For clearly defined and nicely-behaving problems objective data is certainly the way to go. The problem is that a lot of the problems people actually face aren't so easy to understand, and aren't so well-behaved.

This idea is sometimes called the McNamara fallacy:

https://en.wikipedia.org/wiki/McNamara_fallacy

There are a lot of powerful ideas in that article. Every leader of a "data driven company" should read it and tremble.
I did a lot of reading on this particular subject. An opinion which I think is the most correct is that the war was driven mostly by knee jerk reactions and whatever faulty data there was, was never considered.

LBJ didn't want to "lose" Vietnam. LBJ remembered Joseph McCarthy who accused the Truman administration of losing China to communism.

JFK and Ngo Dinh Diem were murdered in 63, Gulf of Tonkin incident in 64, and full US armed intervention by 65. This was a rollercoaster ride that the US military didn't plan for and didn't want.

McNamara picked Westmoreland to lead the military effort. It was a disaster. Their lack of a coherent strategy was doomed from the start. The body counts and #bullets per enemy killed was akin to rearranging the deck chairs on a sinking Titanic. On top of their minds was the Korean War, when the US military operated at China's border and China intervened militarily. Hence, US military land operations was limited to South Vietnam. There were no plans to invade North Vietnam.

Nixon and Kissinger decided the end game was to breakup the USSR-China alliance. With China on the US' side, the USA no longer had a national interest in what happens to Vietnam. The US left Vietnam in 73 and the country was overrun two years later.

Another fundamental problem with this type of approach is that science is hard.

If you don't know how to properly construct experiments and interpret data it's honestly not much better than hiring an ornithomancer to spot omens in the skies. To be clear, this is something that even scientists with decades of both education and experience sometimes get wrong. If you're coming at it with an MBA I've got some concerns.

"Intuition" and "judgement" isn't really the best way to think about this.

The entire problem here is that when people say "data-driven", what they actually mean is "model blind". If you make decisions based on a model, nobody will say it's data-driven, it doesn't matter how empirical the model is. Yet, if you rephrase the question as "Do model blind companies win?" the answer is obvious.

Running incoherent experiments testing for each little event you may think of is certainly better than walking at random. But as everybody knows for centuries, you use data to improve your model, and use your model to improve your craft. Jumping over the model part is an incredibly lousy use of data.

Intuition and judgement come from changing the context of the analysis--the time range/history, the details included. Geopolitical and economic forecasting appear to fit this pattern. The question posed about whether data-driven companies win seems simplistic and formulaic to me personally. Often outcomes can be better fortunes and circumstantial than from over analyzing and using the right approach. Of course both together tend to be where we'd see success looking backward.
An analogy the company I’m at (Rec Room) has used is “Data can help you climb a mountain, it can’t tell you which mountain to climb.”
Haha. Only thing that gradient hill climbing algos are called “local” as they stuck in local maximums. Probably that’s what happened “on the long term”.
100% agree and I think the same issues apply. For our search algorithms we learned how to handle local minima etc.. I just wonder if those solutions apply to data driven companies since the iteration speed and failure cost is so much higher. I'm afraid good intuition will outperform this on the long run.
Agreed.

Data is useful for proving my decisions once I have made them, so what's most useful is having short feedback loops so can I catch bad decisions quickly, or reinforce good decisions.

>Data is useful for proving my decisions once I have made them

I can't figure out what this means, but I have a feeling it's the exact opposite way you should be incorporating data into your decision making.

I took it as:

Prove (?), v. t.

1. To try or to ascertain by an experiment, or by a test or standard; to test; as, to prove the strength of gunpowder or of ordnance; to prove the contents of a vessel by a standard measure.

Thou hast proved mine heart. Ps. xvii. 3.

(Webster's 1913)

Which makes sense to me. Try something you think might work, then test it—is just plain old experimental science.

Something that also tends to be under-appreciated is the constant bias towards successful A/B testing. Successful A/B tests lead to promotions and other positive employee outcomes - negative A/B tests can lead to dismissal in some places. As such people will naturally try to beef up the A/B testing result, or ship a "mixed" result and claim success.
With A/B tests, the only decisions you can make are among alternatives that are very close to the base alternative.

Generally, you can only generate data and do experiments based on your present state.

yep. It is great when you are optimizing a process. But you can't optimize a wrong direction. In some cases it makes it worse.
>Being purely data-driven without good intuition and long-term bets (that can't be "proven" with data), and the product loses its soul.

This sounds like post-hoc, anti-intellectual rationalization.

How do you place long-term bets without a model to measure expectations versus outcomes? How do you know what good intuition is? There is a ton of research on "gut" calls that demonstrates it's random.

>But data is not a substitute for good judgment

Good judgement requires data.

> This sounds like post-hoc, anti-intellectual rationalization.

How is this anti-intellectual?

If you want, I can formalize as a game the problem of choosing business/product strategy in a competitive market with a continuous flow of imperfect information. I can then use ideas from controls to establish some upper bounds on what can be inferred from a continuous flow of information. I can then use that result to prove an impossibility result about the game. I can even tweak assumptions to get bounds on probability distributions which infer we'd be better off flipping a coin or whatever.

I'm not going to do the work, because intuition is almost always enough to identify these situations, but it's absolutely clear to me that results like this obviously exist and correspond to many real-world situations.

> Good judgement requires data.

It used to be that insisting on data-driven decision making was a hard pull. Now it's the opposite. Insisting on data where data cannot possibly provide enough signal to make a decision is the new form of anti-intellectualism. IMO.

> Good judgement requires data.

I would say good judgement requires experience. Data may or may not be available and applicable, but its absence doesn't mean one can't exhibit good judgement.

Waiting for data to somehow materialize to support a new action, without actually trying anything new seems like a recipe for just spinning your wheels doing the same things over and over and getting nowhere in a hurry.

Surely the data comes after the action, not prior to it?

Wouldn't this imply facebook would've built tiktok had they looked at the right data or built right data models.
I think you’re agreeing with OP?