Y
Hacker News
new
|
ask
|
show
|
jobs
We used sparse autoencoders to explain LLM moderation flags of violent threats
(
variance.co
)
6 points
by
karinemellata
420 days ago