| I disagree. It is not the model alone. It needs a system which capitalizes on it. And this is very complex. Hardware, software, architecture - it takes a lot to get it right. Try running the latest OS models on a normal Mac or PC. Claude Fable and Mythos are systems not just pure models. And of course marketing. Don't believe the hype. I think Claude is often times underwhelming. Security concerns are also a concern companies have a blond spot for. The really toughest pro security (Yes, pro! Totally different framing!) company I know is Google after all. What I can companies advise to do is, really having more than just bug bounties but a professional hacker team that does nothing else but attacking them the whole day and night 24/7. This needs to be coordinated with the government otherwise you might sound an alarm and will be SWATed for doing good. And I would pay them huge sums since the risk and fallout warrant such a treatment, not the standard wage. Hackers are the real deal, not AI. Proof: Hackers using AI. |
It can be done through the magic of SSD offload. The worst case involves seconds-per-token speeds, but that's OK if you only care about low volumes of slow unattended inference, which maximizes utilization for the hardware.
(The real worst case, where you're streaming the whole model from the cheapest storage you could feasibly think of, involves multiple minutes per token for a single inference, or even hours per token batch if you're doing many inferences in bulk. That's a lot less helpful, so there's a space for smaller models at the edge, even for unattended workloads.)