Hacker News new | ask | show | jobs
by jlund-molfese 522 days ago
Two comically bad lines in an AI-generated spam email I recently received:

"Saw on LinkedIn that you spoke Spanish. I've heard that the way "¡Qué chévere!" brings such energy and brightness to a conversation is uniquely charming. Have you had a chance to practice it recently?"

"Develop a compliance automation tool that adapitates to changing regulations, reducing overhead costs while ensuring secure and efficient investment programs."

No human would ever see my "limited working proficiency" of Spanish on LinkedIn and say something like the first line! And the second? "Adapitates" is not a real word, it's a hallucination. https://old.reddit.com/r/ChatGPT/comments/1d8gc6x/did_chatgp...

Sales isn't the problem, and most people are tolerant of some level of sales. I've gotten unpersonalized cold outreach from a data replication company that actually made me interested in the product, because it was short, to the point, and (as far as spam emails can be), authentic.

5 comments

For the recruiters, a really good way to tell if you are being LLM'd is to put a little special watermarking sauce in your LinkedIn profile current company/position. For example, if you are a database engineer at some specific government agency, in the current position, you'd put something like "DB Engineer" at "Government". A quality recruiter will dig more, maybe look at your actual resume you have linked from your static site or whatever and come up with a good greeting in an email. A bot/LLM will simply insert that generic text into the direct message or email - "your position at government".

One can even go as far as entering something with even more of a watermark. An example: Adding more spaces in the role, like "DB Engineer" with two spaces instead of one. Using the Alt + (Numpad 255) unicode instead of a regular space is a bonus here.

Doesn't always work, but anecdotally I've noticed it will more often than not in differentiating automated garbage.

How is AI hallucinating words now? I thought that would have been the easiest thing to restrict with a sufficient dictionary.

Or maybe it's an ancient dictionary. I was kind of surprised at the sizes of ]dictionaries I could find while trying to test out a personal project.

IIUC the input to LLMs is tokenized not on word boundaries but some kind of inter-syllable boundaries, because then whatever the model associated with "task" will also apply to "tasking", "tasked", "taskmaster", etc for example. So a model making up compounds that don't exist would be fully possible and even desirable, especially since real humans do it with English all the time.
They’re called “lemma”
The intent is the same, but as I understand it LLMs don't tokenize based on lemmas, though some of the tokens probably line up with them.
Wow, I would normally assume that a made-up word is proof that it wasn't AI generated, just regular bad human writing.
Maybe adapitates should be a word! ;-)
After all, LLM is not supposed to regurgitate things verbatim :)

Just regurgitate things . . .

Was the main reason you were interested in it because you actually could see yourself using the product? Rather than because it was short and to the point?