| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by atonse 211 days ago

Let me clarify, when I perform interviews, I tell my candidates they can do _everything_ you would do in a normal job, including using AI and googling for answers.

But just to humor you (since I did make that strong statement), without googling or checking anything, I would start with basic regular expression ranges (^[A-za-z\s\.\-*]) etc and do a find-replace on that until things looked coherent without too much loss of words/text.

But the problem isn't me, is it? It's the AI companies and their crawlers, that can trivially be changed to get around this. At the end of the day, they have access to all the data to know exactly which unicode sequences are used in words, etc.

1 comments

lawlessone 211 days ago

ah i hadn't thought of regex.

true.

It does put the AI companies in the position though of continuing to build/code software that circumvents their attempts to steal content though.

Which might be looked upon unfavorably whenever dragged to court.

link

atonse 211 days ago

Good point. Then it's actually an active attempt, right?

Also I realized my statement was a bit harsh, I know someone probably worked hard on this, but I just feel it's easily circumvented, as opposed to some of the watermarks in images (like Google's, which they really should open source)

link

wdpatti 208 days ago

Thanks for all the compliments!

In all reality I spent like 30 minutes on this one Sunday afternoon when every model failed nearly 100% of the time - now it's more like 95% but about half figure out that there is something wrong and prompt the user to fix it. This isn't meant to be a permanent fix at all - just a cool idea that will be patched just like DANs were back in 2023.

link

wdpatti 208 days ago

Looks like someone finally got it!

link