|
|
|
|
|
by chawco
550 days ago
|
|
While I agree as a whole, there are parts that are easily captured even with some small false positive rate, like credit card numbers. I do think it's acceptable to do PII detection probabilistically for some classes of identifiers/quasi-identifiers, because you can't really do any better without crazy false positive rates, things like credit card numbers have enough structure that it's more work to do it entirely via an ML model with a higher chance of failure, versus just building a simple heuristic for it. Add to that the fact that missing a credit card number is way higher stakes than missing something like a zip code, you can understand why something like this is just not acceptable in a product like this, with the resources Microsoft has at their disposal. |
|