| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by timtom123 532 days ago
	So much spam around this model. LocalLLaMA is stuffed with spam posts and even hacker news is getting spammed. Who has actually ran this model and verified performance? Does anyone know of a decent review from a trustworthy source?

3 comments

starfezzy 531 days ago

Where’s the spam?

I scrolled dozens of posts without seeing a single mention of this—the biggest (certainly the most interesting) LLM news recently. When something big happens with Claude or ChatGPT there are more posts, but nobody calls that “spam”.

Anyways, if you were actually following locallama (a subreddit about running LLMs locally, where this is by far the biggest and most relevant news topic currently) you’d have seen this post https://www.reddit.com/r/LocalLLaMA/s/Yay5njt963 where a guy is working on running deepseek on llamacpp and demonstrates ~8tk/s using a cpu.

link

timtom123 530 days ago

I am not GPU poor and don't care about speed. I care about how good the model is which is much harder to measure and much harder to do. I have not seen many independent reviews. There are finally some coming out now but a lot of this is just marketing hype to drive attention. Every AI company does it.

link

starfezzy 530 days ago

Just because you’re out of touch with the community, and your wishes don’t align with the rest of us, doesn’t mean a major event is “spam”.

link

x_may 532 days ago

The LMSYS leaderboards are crowdsourced and would be hard to fake, it showing a pretty strong performance in terms of human preference.

link

paxys 532 days ago

Crowdsourced data is the easiest to fake unless you can somehow ensure that you have a completely unbiased population (which is impossible). There's a reason why certain models do so well on upvote-based leaderboards but rank nowhere on objective tests.

link

CGamesPlay 532 days ago

Which ones? I think fine-tunes are where I see most of this (I'll just call it) "model spam", but the base models don't seem to exhibit this behavior. I do see some models perform way below the curve compared to their benchmark performance, though (Phi family being the most famous).

link

feverzsj 532 days ago

I've tried it. It's average at best. Nothing comparable to ChatGPT.

link