| HN Mirror

This conversation began as a conversation about Claude, which has access to 100s of 1000s of people with no training and no interest in learning about how to prevent Claude from doing damage to society. That makes it materially different from a library because even if an intruder can subvert a library running on servers serving 100s of 1000s of users, e.g., a library for compressing files is very unlikely to be able to start having conversations with a large fraction of those users without someone noticing that something is very wrong.

Although I concede that there are some applications of AI that can be made significantly safer using the measures you describe, you have to admit that those applications are fairly rare and emphatically do not include Claude and its competitors. For example, Claude has plentiful access to computing resources because people routinely ask it to write code, most of which will go on to be run (and Claude knows that). Surely you will concede that Anthropic is not about to start insisting on the use of a sandbox around any code that Claude writes for any paying customer.

When Claude and its competitors were introduced, a model would reply to a prompt, then about a second later it lost all memory of that prompt and its reply. Such an LLM of course is no great threat to society because it cannot pursue an agenda over time, but of course the labs are working hard to create models that are "more agentic". I worry about what happens when the labs succeed at this (publicly stated) goal.