Hacker News new | ask | show | jobs
by thewataccount 1096 days ago
It's interesting that they've appeared to have undertrained their 30B model at least compared to LLama/Falcon.

The coding ability performed better, but it's still far behind WizardCoder which is half the size - of course WizardCoder wasn't released why they started training MPT-30B.

The 8k context is an interesting addition. Are there any standard benchmarks to show how coherently models perform at different context lengths - 1k, 2k, 4k, 8k, etc?