I wouldn't be so haughty and presumptive of your understanding of things is as they are: this doesn't have practical applications.
No one serious is going to build on some horror of Python interpreter running inside your app to run an LLM when llama.cpp is right there, with more quants available. In practice, on mobile, you run out of RAM headroom way more quickly than CPU headroom. You've been able to run llama.cpp 3B models for almost a year now on iOS, whereas here, they're just starting to be able to. (allocating 6 GB is a quick way to get autokill'd on iOS...2.5GB? Doable)
It looks like spinquant is effectively Q8, in widespread blind testing over months, empirically, we found Q5 is assuredly indistinguishable from the base model.
(edit: just saw your comment. oy. best of luck! generally, I don't bother with these sorts of 'lived experience' details, because no one wants to hear they don't get it, and most LLM comments on HN are from ppl who don't have the same luck as to work on it fulltime. so you're either stuck aggressively asserting you're right in practice and they don't know what you're talking about, or, you're stuck being talked down to about things you've seen, even if they don't match a first-pass based on theory) https://news.ycombinator.com/item?id=41939841
I don’t get the comment. For one I’m excited for developments in the field. Not afraid it will “replace me” as technology has replaced me multiple times over. I’m looking towards working with these models more and more.
No, I meant that a lot of us are working very fast on a pre-launch product, implementing some cutting edge ideas using e.g. the incredible speedup in a small fast inference model like quantized 3B in combination with other tools, and I think there's quite a bit of paranoia out there that someone else will beat you to market. And so not a lot of sharing going on in the comments. At least not as much as previously, and not as much technical discussion vs other non-AI threads on HN.
I’m focused on making models play nice with each other rather than building a feature that relies on it. That’s where I see the more relevant work being. Why such news are exciting!
What kind of fundamental discussion are you hoping to see under an article about an iterative improvement to a known model?
"AI will destroy the world"? "AI is great and will save humanity"? If you're seriously missing that, there's really enough platforms (and articles for more fundamental announcements/propositions on this one) where you can have these.
I mean, this outcome of LLMs is expected and the frequency of LLM drops are too fast, and definitely too fast to wait for Meta to do an annual conference with a ton of hype, and furthermore these things are just prerequisites for a massive lemming rush of altering these models for the real fun, which occurs in other communities