Y
Hacker News
new
|
ask
|
show
|
jobs
by
soupspaces
19 days ago
Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.