| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by speedgoose 1133 days ago
	True. But this one I’m very confident I can do it myself and I’m not even an expert in the field.

1 comments

ok then. do it and post it to HN. put it up for scrutiny and testing.

i’m very confident someone can prove you wrong, without being an expert in the field.

I would start by creating a dataset of such prompt hacks. A lot of them are already on GitHub, Reddit, and HN.

To get even more of them I could consider gamification. This game is a good example: https://gandalf.lakera.ai/

Once I get a descent dataset, I could use it to finetune a LLM to do classification. Or play with embeddings and cosine similarity and similar.

I could also use LLMs to extend the training dataset, and have some human feedback.

It’s maybe not the best strategy and I’m sure someone else can do it better but I don’t think it’s wrong.

so, to summarize, you think it is easy, and you think you have an approach that would lead to a viable solution.

while interesting, your napkin math isn’t convincing.

I’m sorry I didn’t convince you.

it’s ok. you still could, if you build it.

anything less is banter.