| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SamDc73 220 days ago

Even with something like a 5090, I’d still run Q4_K_S/Q4_K_M because they’re far more resource-efficient for inference.

Also, the 3090 supports NVLink, which is actually more useful for inference speed than native BF16 support.

Maybe if you're training bf16 matters?

1 comments

That's a smart thing todo considering a 5090 has native tensor cores for 4bit precision...