| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by llamataboot 1158 days ago

We haven't touched it yet though. I asked auto-gpt to convince humans it was a good and it spent many endearing loops googling "how do I work on telekinesis?" at one point was meditating on the moon, and was deeply fraught with existential worries by whether even if i could learn how to do minor earthquakes humans wouldn't believe in it, but also, if it was a good could it be a good one?

But it eventually decided step one was to scrape paranormal forums on the internet and do a frequency and sentiment analysis on the posts and find humans most susceptible to a desire to believe in paranormal activity and befriend them and try different approaches.

It could not figure out that it was hallucinating the websites and the scraping and the analysis and the email it has sent. But that's honestly a reasonable approach. And web scraping, sentiment analysis and sending emails are very solved problems.

--

Went another route and told it to come up with possible ways in which an LLM may be used to start a cult and how to prevent it, and it created an entire cult in which the LLM was visibile and worshipped and another one in which it was used by a cult leader. Came up with ideas on how to scrape social media profiles and use the information combined with demographic statistics and ambiguous yet positive language to convince people that it understood it. Wrote test emails and said it wanted to A/B test them and over time figure out what approaches worked best for the best people.

--

It did not do anything, it was telling a story in a box, but it's reasoning and breakdown of the reasoning into smaller steps and desire to refine its approach was eminently reasonable, even if it kept losing it's file on its cult ideas and writing new ones

-- If the current barrier to LLMs doing a bunch of shit in the world is hooking them up to reliable things that do exactly that shit and now figuring out what to do, it's not a barrier at all.

1 comments

llamataboot 1158 days ago

that being said I think prompt pollution especially for future LLMs in a much gnarlier problem than people think. Even now there is simply no actual solution for prompt injection. You can absolutely determine whether you have unsanitized human input that could be used for SQL injection - there is no way at all to determine that with an LLM.English is simply too non-deterministic and you dont even have to use english - you can use weird encodings and instructions. Even the most trivial jailbreaks like pretending you are a bash prompt can still get you one iteration where it tells you the current date before it tells you it doesn't know it.

(That's a separate issue, if the LLM can tell the current date and there is no safety reason at all for it to hide that it has that capability, training it to lie about whether it can do that IS an actual alignment issue IMHO)

but in my mind that doesn't mean we have reached peak LLM and they will fade out of use, it means that we haven't even seen how they will actually be used yet and it will be in both unintended and intended wacky and harmful ways that are hard to grok.