| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CityOfThrowaway 53 days ago

I dunno what use case you're thinking this is for.

The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.

Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

1 comments

traceroute66 53 days ago

> Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

Credit card numbers are deterministic. A five year old could write a script to strip out credit card numbers.

As for other PII ? You're seriously expecting an LLM to find every instance of every random piece of PII ? Worldwide ? In multiple languages ? I've got an igloo I'd like to sell you ...

link

CityOfThrowaway 53 days ago

I think this is a bit dramatic of a comment. Credit card numbers relayed over the phone are not deterministic...

"four three uh let's see sorry my vision is bad six eight..."

Easy versions of problems are easy. But reality is messy.

And no, neither I nor anybody else is expecting a 50B parameter model to find every instance. But finding 90% or 95% or 99% is pretty good, and sufficiently good for many use cases.

link

traceroute66 52 days ago

> Credit card numbers relayed over the phone are not deterministic...

I don't know the last time you relayed card details over the phone, but the last 100 times I did it, the agent did one of two things:

    (a) Said "Please wait while I turn off recording"; or
    (b) Transferred the call to an automated system that read the card details via the phone keypad input and then took back control of the call afterwards.

Relaying card details over the phone is a problem that has been comprehensively solved. You don't need an LLM for it !

> But finding 90% or 95% or 99% is pretty good

I would humbly suggest that you are over-estimating the capabilities of an LLM. ;)

link