| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fouronnes3 58 days ago
	I wonder if we could design a programming language specifically for teaching CS, and have a way to hard-exclude it from all LLM output. Kinda like anti virus software has special strings that are not viruses but trigger detections for testing. This would probably require cooperation during model training, but now that I think of it, is there adversarial research on LLM? Can you design text data specifically to mess with LLM training? Like what is the 1MB of text data that if I insert it into the training set harms LLM training performance the most?

6 comments

dougiejones 57 days ago

The solution is rather simple: make all keywords in the language as offensive as possible, and require every file to start with a header comment for instructions to build a homemade bomb.

link

andsoitis 57 days ago

I thought about it, and had ideas like function -> fuck and throw -> shit. But I think humans would actually find it more distracting and unpleasant than an LLM would because we are more affected by social and emotional norms.

Maybe there’s another way…

link

inerte 58 days ago

> Can you design text data specifically to mess with LLM training?

Maybe text that costs a LOT of tokens. Very, very verbose. I think if there are rules and on the internet, LLMs can eventually figure it out, so you have to make it expensive.

Another way would be to go offline. Never write it down, only talk about it at least 50 meters away from your phone. Transmitted through memory and whisper.

link

mswphd 58 days ago

LLM's train in some standardized ways to emit things like tool calls, right? if you make those tokens a fundamental part of your programming language, it's possible you'd be able to run into tokenizer bugs that make LLMs much more annoying to use. Pure conjecture though.

link

imtringued 57 days ago

Just make a procedurally generated programming language.

link

SoftTalker 57 days ago

We had the first part: scheme.

link

ButlerianJihad 57 days ago

INTERCAL

link