| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by speedgoose 1133 days ago

I would start by creating a dataset of such prompt hacks. A lot of them are already on GitHub, Reddit, and HN.

To get even more of them I could consider gamification. This game is a good example: https://gandalf.lakera.ai/

Once I get a descent dataset, I could use it to finetune a LLM to do classification. Or play with embeddings and cosine similarity and similar.

I could also use LLMs to extend the training dataset, and have some human feedback.

It’s maybe not the best strategy and I’m sure someone else can do it better but I don’t think it’s wrong.

1 comments

so, to summarize, you think it is easy, and you think you have an approach that would lead to a viable solution.

while interesting, your napkin math isn’t convincing.

I’m sorry I didn’t convince you.

it’s ok. you still could, if you build it.

anything less is banter.