|
|
|
|
|
by windowshopping
110 days ago
|
|
The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning? For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does. |
|
An enormous amount of research+eng work (most of the work of frontier labs) is being poured into making that 'correct' modifier happen, rather than just predicting the next token from 'the internet' (naive original training corpus). This work takes the form of improved training data (e.g. expert annotations), human-feedback finetuning (e.g. RLHF), and most recently reinforcement learning (e.g. RLVR, meaning RL with verifiable rewards), where the model is trained to find the correct answer to a problem without 'token-level guidance'. RL for LLMs is a very hot research area and very tricky to solve correctly.