Hacker News new | ask | show | jobs
by timtom123 532 days ago
So much spam around this model. LocalLLaMA is stuffed with spam posts and even hacker news is getting spammed. Who has actually ran this model and verified performance? Does anyone know of a decent review from a trustworthy source?
3 comments

Where’s the spam?

I scrolled dozens of posts without seeing a single mention of this—the biggest (certainly the most interesting) LLM news recently. When something big happens with Claude or ChatGPT there are more posts, but nobody calls that “spam”.

Anyways, if you were actually following locallama (a subreddit about running LLMs locally, where this is by far the biggest and most relevant news topic currently) you’d have seen this post https://www.reddit.com/r/LocalLLaMA/s/Yay5njt963 where a guy is working on running deepseek on llamacpp and demonstrates ~8tk/s using a cpu.

I am not GPU poor and don't care about speed. I care about how good the model is which is much harder to measure and much harder to do. I have not seen many independent reviews. There are finally some coming out now but a lot of this is just marketing hype to drive attention. Every AI company does it.
Just because you’re out of touch with the community, and your wishes don’t align with the rest of us, doesn’t mean a major event is “spam”.
The LMSYS leaderboards are crowdsourced and would be hard to fake, it showing a pretty strong performance in terms of human preference.
Crowdsourced data is the easiest to fake unless you can somehow ensure that you have a completely unbiased population (which is impossible). There's a reason why certain models do so well on upvote-based leaderboards but rank nowhere on objective tests.
Which ones? I think fine-tunes are where I see most of this (I'll just call it) "model spam", but the base models don't seem to exhibit this behavior. I do see some models perform way below the curve compared to their benchmark performance, though (Phi family being the most famous).
I've tried it. It's average at best. Nothing comparable to ChatGPT.