Hacker News new | ask | show | jobs
by kp1197 333 days ago
Does performing gradient descent on token input embeddings lead to interpretable results? And if not, why?