| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pona-a 895 days ago

1. This would apply to any system with a rudimentary world model, including probably modern Google search or Wolfram Alpha. By this logic, any sufficiently advanced search engine, computational chemistry system, or perhaps even an NLP calculator like Sulver would to a varying level "aid in terrorist activities" by the virtue of just doing what it was designed to.

2. Unlike say Wolfram Alpha which can just remove any number of compounds from its knowledge base, erasing concepts from LLMs is much more complicated than an SQL query. In fact, it at present moment seems to be nearly impossible.

RLHF fine-tuning doesn't seem to add nor remove information learned in pre-training, naive regexes or classification models post-generation don't work well with response streaming nor are particularly difficult to circumvent with a small change of phrasing. Creating a smaller curated dataset thoroughly searched for all "dangerous" information doesn't work in today's paradigm of blind model scaling (and would by the way allow say your very phone to run a tiny "safe" model, since LLMs derive most of their world model through memorization)

3. Are OpenAI, or potentially very soon Microsoft, Google, Amazon, and the rest of big tech, trustworthy custodians for this supposedly dangerous tool? What if they themselves choose to forgo the safety measures if it means a higher eval score? What if they use their power of MITMing the almighty black box to hide evidence of copyright violation or hard-code correct answers to safety benchmarks? What if users' relationship with LLMs becomes more para-social and with increased pressure to actually make any real profit outside of VC speculation they'd increasingly override model's responses with advertisements?

---

I agree LLMs present a real problematic challenge to safety, but in my belief it stems not from them becoming too perfect search engines, but just very good stochastic parrots capable of inducing delusions in vulnerable individuals.

See cases below:

- https://www.theregister.com/2023/10/06/ai_chatbot_kill_queen... with a commercial model.

- https://www.euronews.com/next/2023/03/31/man-ends-his-life-a... with an open-weights model.