Hacker News new | ask | show | jobs
by moffkalast 642 days ago
As others point out, it's essentially distillation of a larger model to a smaller one. But you're right, it doesn't work very well. Phi's performance is high on benchmarks but not nearly as good in actual real world usage. It is extremely overfit on a narrow range of topics in a narrow format.