| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by marklyon 3551 days ago

I provide guidance to attorneys involved in the discovery process; "Technology Assisted Review" is of huge interest to those teams, as it allows them to leverage coding on a small sample of the population across a much larger set of documents. For many cases, the cost and (occasional) time savings is instantly attractive. Sadly, the process is hard to do well. Far too many screw it up in new and amazing ways.

The author's concerns over machine learning are well-founded. The best option I've been able to identify to ameliorate some of the concerns is focusing on the population that will be suppressed. Once the model returns the desired recall / precision, drawing samples from the excluded population with a rigorous acceptance standard can help validate whether you've simply built a model around your biases. Couple that with allowing an opponent to validate a randomly-selected sample and you've cleared up a lot of the uncertainty in the model.

It's not perfection, but perfection is a very difficult standard.

1 comments

abofh 3550 days ago

The issue with that approach is ensuring the suppressed are represented. When it's black vs white, you can oversample one and be done.

However, if there's any winner take all built into the system, there's a strong incentive to not even acknowledging dissent.

link