|
|
|
|
|
by dartos
538 days ago
|
|
That leads to overfitting in ML land, which hurts overall performance. We know that unique data improves performance. These LLM systems are not students… Also, which students graduate and are immediately experts in their fields? Almost none. It takes years of practice in unique, often one-off, situations after graduation for most people to develop the intuition needed for a given field. |
|
The more concepts the model manages to grok, the more nonlinear its capabilities will be: we don't have a data problem, we have an educational one.
Claude 3.5 was safety trained by Claude 3.0, and it's more coherent for it. https://www.anthropic.com/news/claudes-constitution