|
|
|
|
|
by speedgoose
1133 days ago
|
|
I would start by creating a dataset of such prompt hacks. A lot of them are already on GitHub, Reddit, and HN. To get even more of them I could consider gamification. This game is a good example: https://gandalf.lakera.ai/ Once I get a descent dataset, I could use it to finetune a LLM to do classification. Or play with embeddings and cosine similarity and similar. I could also use LLMs to extend the training dataset, and have some human feedback. It’s maybe not the best strategy and I’m sure someone else can do it better but I don’t think it’s wrong. |
|
while interesting, your napkin math isn’t convincing.