Y
Hacker News
new
|
ask
|
show
|
jobs
by
cma
265 days ago
They had to largely use alpha fold for the data part of the transformer scaling laws so not quite a bitter lesson, but still interesting.