|
|
|
|
|
by IronyMan100
249 days ago
|
|
Does this Not make sense? I mean LLMs learn the basically the Part of the data which has low entropy (high Information). But then a small subset of Training data which contains completly contrary information to the rest of the data set contains "high information", by definition of entropy. |
|