Hacker News new | ask | show | jobs
by DalasNoin 751 days ago
Current systems are already (in a limited way) helping with alignment, anthropic is using its AI to label the sparse features of their sparse auto encoder approach. I think the original idea of labeling neurons by AI came from william saunders, who also left openai recently.