Hacker News new | ask | show | jobs
We used sparse autoencoders to explain LLM moderation flags of violent threats (variance.co)
6 points by karinemellata 420 days ago