"On metal" is muddied too. I've heard people refer to web apps running in an OCI container as being "bare metal" deployment, as opposed to AWS or whatever hosting platform.
That's silly, but the idea that "local" is not the opposite of remote is even sillier.
If you do bare metal as not being under a VM it fits. OCI on linux is cgroup so that counts as not a VM I'd say. Or at least it's a layer closer to the metal than a typical VM running OCI images.
You can run an OCI container on bare metal though. It doesn't stop being run on bare metal just because you're running in kernel namespaces, aka docker container
Lots of people were advocating for running their k8s on bare metal servers to maximize the performance of their containers
Now wherever that's applied to your conversation... I've no clue, too little context ( 。 ŏ ﹏ ŏ )
Bare metal in the context of running software is a technical term with a clear meaning that hasn't become contested like "AI" or "Crypto" - and that meaning is that the software is running directly on the hardware.
As k8s isn't virtualization, processes spawned by its orchestrator are still running on bare metal. It's the whole reason why containers are more efficient compared to virtual machines
H100's can be $2 and hour, so $192 an hour for the full cluster. They report 22k tokens per second, so ~ 80 million an hour, thats $16 an hour at $0.2 per million. Maybe a bit more for input tokens, but it seems a long way off.
I think you mis-read. Thats 22k tokens per second per node, so per 8 h100's. With 12 nodes they get 264k tokens per second, or 950 million an hour. This get's you to roughly $0.2021 per million at $2 an hour.