Hacker News new | ask | show | jobs
user: karinemellata
created: 2020-07-07
karma: 59

submissions:

Alignment is not free: How model upgrades can silence your confidence signals
121 points | 67 comments
We used sparse autoencoders to explain LLM moderation flags of violent threats
6 points | 0 comments
0 points | 0 comments
0 points | 0 comments