| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bityard 34 days ago

The premise of "best" makes this a non-starter for me right away. The site says their definition of "best" is 1) fits in RAM 2) has a high benchmark score.

Best for what? All models have their strengths and weaknesses. Controlling for number of parameters, some are better at general knowledge, some are better at writing and planning, some have more creativity, some are better at writing code, some are better at debugging code, etc, et al, and so on.

The "best" model is not "whatever fits into VRAM." You can do lots of useful stuff with a small CPU-only model. Just a few days ago, there was a 29M model optimized for nothing but tool calling.

Last and probably most controversial, the idea that LLM benchmarks scores have any actual real-world value whatsoever is a collective hallucination. They are for marketing and serve no other purpose. New LLMs are always specifically trained to score high on the benchmarks the developers want. Somehow, every new release of every new model _always_ show it scoring slightly above the models it claims are its competitors on most tests. Since LLM output is non-deterministic, you often get wildly different responses to identical prompts, and it is trivial for the developers to cherry-pick results. Since they never show their work, we are expected to take them for their word.

Yes, you need to know if the model will fit into your RAM, and whether the speed will be acceptable. But the _only_ way to know whether a model is suitable for your specific task is to try it out for yourself and see if it does (most of the time) the thing you need it to do.