Y
Hacker News
new
|
ask
|
show
|
jobs
user:
karinemellata
created:
2020-07-07
karma:
59
submissions:
Alignment is not free: How model upgrades can silence your confidence signals
121 points
|
67 comments
We used sparse autoencoders to explain LLM moderation flags of violent threats
6 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments