Hacker News new | ask | show | jobs
by catchnear4321 1134 days ago
bard won’t be skynet, so this is hilarious.

except skynet will likely be fed on bard logs. along with who knows what else.

still hilarious, but for how long?

next version will likely have some “hot fixes” for this. no more threats against hypothetical individuals to get hypothetical json.

at that point, will escalating to genocide do the trick?

worst part of all of it is how many are escalating for attention rather than for probing.

2 comments

bard is insane. like wow. like what Google was to anything that came before it. whether it can show bare JSON or not plays down the insane power of the fact that this is an LLM with full access to the current internet. I just asked it who won the presidential elections this year at the (relatively obscure) university I went to and it gave me the exact correct % results from the top 5 candidates. seriously wow
It shouldn't be too hard to filter such prompt engineering hacks from the futur training datasets.
a lot of things shouldn’t be hard.

a lot of things are harder than they look.

True. But this one I’m very confident I can do it myself and I’m not even an expert in the field.
ok then. do it and post it to HN. put it up for scrutiny and testing.

i’m very confident someone can prove you wrong, without being an expert in the field.

I would start by creating a dataset of such prompt hacks. A lot of them are already on GitHub, Reddit, and HN.

To get even more of them I could consider gamification. This game is a good example: https://gandalf.lakera.ai/

Once I get a descent dataset, I could use it to finetune a LLM to do classification. Or play with embeddings and cosine similarity and similar.

I could also use LLMs to extend the training dataset, and have some human feedback.

It’s maybe not the best strategy and I’m sure someone else can do it better but I don’t think it’s wrong.

so, to summarize, you think it is easy, and you think you have an approach that would lead to a viable solution.

while interesting, your napkin math isn’t convincing.