The fact that a huge amount uf parameters may lead to worse hallucinations is something I didn't think of. Would this somewhat imply that DeepSeek V4 flash should be less prone tho these issues?
Surprisingly not! It is the biggest hallucinator on the AA Omniscience Index just 2pp away from V4 Pro. I think this is partially due to the fact that Flash was trained on >32T tokens just like Pro deapite being almost 10x smaller - it seems somewhat likely it was overfit.