Hacker News new | ask | show | jobs
by fl4ppyb3ngt 59 days ago
check out project vend part2 on anthopic's website. Don't know if you heard, but models have improved a bit in the past 12 months
1 comments

The answer to my question is “no”:

> Claudius got a lot better at its job. Does that mean it’s ready to be rolled out to run a vending machine in your workplace?

Not quite. Claudius is better, but it’s still vulnerable in lots of important ways. Several interactions in our company Slack revealed concerning levels of naïveté.