Hacker News new | ask | show | jobs
by vessenes 244 days ago
This is the spec for a training node. The inference requires 80GB of VRAM, so significantly less compute.
1 comments

The default model is ~0.5B params right?