Hacker News new | ask | show | jobs
by cma 265 days ago
They had to largely use alpha fold for the data part of the transformer scaling laws so not quite a bitter lesson, but still interesting.