Hacker News new | ask | show | jobs
by smcnally 409 days ago
`model.safetensors` for Qwen3-0.6B is a single 1.5GB file.

Qwen3-235B-A22B has 118 `.safetensors` files at 4GB each.

There are a bunch of models and quants between those.

1 comments

Does it run in 8x80G? Or does the KV cache and other buffers push it over the edge?