Hacker News new | ask | show | jobs
by Detrytus 962 days ago
Is it just me or is "obfuscation" like "john [at] company [dot] com" trivially solved with regular expressions? Or even simple search/replace? Are there more advanced techniques for that?
1 comments

I think the point is that particular type of obfuscation is an example, and a regex will only catch that one. If the obfuscation is substantially different, you'll need another regex which you'll have to write yourself. Whereas the LLM doesn't need to be told about the specific type of obfuscation in use, and can act in a more general way - including against some new types that haven't been used before.
Still, you can get a collection of like 10-20 regexes for most common types of obfuscation, and that will solve the problem like 90% of the time. And it is much cheaper, computationally, than running LLM on the whole content.
I always felt email obfuscation is just a cargo cult and the reduction in spam is only from improvements in anti-spam tech.

I never obfuscated my address and pretty much haven't seen spam since first days of Gmail. And very little even before then thanks to SpamAssassin.

Also raw email addresses can be easily harvested from git repos, mailing list archives and possibly other sources. A lot of technical people who chose to obfuscate likely posted to one such system at some point.

It is not a cargo cult if you use methods that are more difficult. Can a LLM figure this one out?

abc 132 pyrogenics dndex vufwd bocjz pogl

How about this one?

password vectorization collins 2019 64k little, clotured aerobrakings audiologically cumins ashpans amphibian acciaccatura alligated denunciates burnouts babbles briskier cimbaloms brahmanist adiposes bridgeboards

Obfuscation can be as obscure as you want it to be. If you invent your own no spammer will take the trouble to figure it out. Then again... not many readers will either.

Your examples are useless because humans would not understand them either.