|
|
|
|
|
by adontz
601 days ago
|
|
A little story from personal experience. Most people think of feature flags as boolean on/off switches, maybe per user on/off switches. If one is testing shades of colors for a "But Now!" button that may be OK. Regarding more complex tests my experience is that there are not a lot of users who tolerate experiments. Our solution was to represent feature flags as thresholds. We assigned a decimal number [0.0, 1.0) to each user (we called it courage) and a decimal number [0.0, 1.0] to a feature flags (we called it threshold). That way not only we enabled more experimental features for most experiment tolerant users, but these were the same users, so we could observe interaction between experimental features too. Also deploying a feature was as simple as rising it's threshold up to 1.0. User courage was 0.95 initially and could be updated manually. We tried to regenerate it daily based on surveys, but without much success. |
|
We ended up creating just ~100 versions of our app (~100 experiment buckets), and then you could join a bucket. Teams could even reserve sets of buckets for exclusive experimentation purposes. We also ended up reserving a set of buckets that always got the control group.
You've approached it a different way, and probably a more sustainable way. It's interesting. How do you deal with the bias from your 'more courageous' people?