Hacker News new | ask | show | jobs
by soupspaces 19 days ago
Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.