Hacker News new | ask | show | jobs
by BoorishBears 331 days ago
In the BERT era of language models, it was normalized that to get the best performance for a task, you probably needed targeted post-training

As models got bigger and instruction following got better, everyone jumped on the general capabilities of the model + prompting

We're approaching wall that needs to be overcome with a completely new and unheard of breakthrough, otherwise we're going to have to go back to specialized post-training (which lends itself to vertical solutions)

I think people are seeing that now with stuff like Devstral being posttrained specifically for OpenHands and massively over-performing for its size at agentic coding