Hacker News new | ask | show | jobs
by redox99 5 days ago
Chain of Thought was kind of an obvious solution that everybody knew was necessary by the time chatgpt / gpt4 came out. It was just a matter of time that frontier labs actually shipped it.

MoE was also pretty straightforward, just a bit surprising how well it worked (that you can get away with just 1/32 active parameters), but most researchers would have come up with it on their own probably.

The true ground breaking papers are the first two you mentioned (transformers and gpt2), and InstructGPT was also very surprising that it worked so well.

1 comments

Reasoning is a little bit more than just "baked in" chain of thought prompting. The important takeaway here was that it is not realized at the architecture level of the neural network. And you could say that all these things regarding LLMs were pretty straightforward. But only in hindsight, otherwise there wouldn't have been so much time and effort spent on intermediaries. Breakthroughs mean people simply didn't know stuff before, even if it seems easy with the benefit of hindsight.