|
|
|
|
|
by selicos
34 days ago
|
|
Their more recent post seems to suggest it was worthwhile. https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri... Abstract/TLDR: LLMs are notoriously formulaic at writing, overusing certain tokens or phrases. I show that models trained with SFT fail to match the distribution of the training data by using Maximum Mean Discrepancy (MMD), Judge Model Quality (JMQ), and L2 Token Distribution. |
|
The raw infra being local didn't enable any of that. Now if was building ASICs at TMSC that would a different thing because you'd then be using something different locally.