Hacker News new | ask | show | jobs
by nacs 263 days ago
I don't know if it will stay this low but the whole point of v3.2 is to be cheaper to run than <= v3.1.

(The inference costs are cheaper for them now as context grows because of the Sparse attention mechanism)