Hacker News new | ask | show | jobs
by thot_experiment 554 days ago
For prompt adherence it still fails on tasks that Gemma2 27b nails every time. I haven't been impressed with any of the Phi family of models. The large context is very nice, though Gemma2 plays very well with self-extend.
2 comments

It's a much smaller model though.

I think the point is more the demonstration that such a small model can have such good performance than any actual usefulness.

Gemma2 9B has significantly better prompt adherence than Llama 3.1 8B in my experience.

I've just assumed it's down to how it was trained, but no expert.

Yeah they mention this in the weaknesses section.

> While phi-4 demonstrates relatively strong performance in answering questions and performing reasoning tasks, it is less proficient at rigorously following detailed instructions, particularly those involving specific formatting requirements.

Ah good catch, I am forever cursed in my preference for snake over camel.