|
|
|
|
|
by jimmySixDOF
247 days ago
|
|
I liked how Karpathy explained part of this problem as "silent collapse" in his recent Dwarkesh podcast. Meaning the models tend to fall into a local minima situation of using a few output wording templates for a large number of similar questions, and this lack of entropy diversity it becomes a tough hard to detect problem when doing distillation or synthetic data generation in general. These algorithms as nice python functions are also useful repurposed for labeling parts of ontology and topic clusters etc [1]. Will definitely star and keep an eye on the repo ! [1] https://jina.ai/news/submodular-optimization-for-text-select... |
|