Hacker News new | ask | show | jobs
by bcjdjsndon 20 days ago
Shame you stopped short of actually benchmarking that scale though, eh?
1 comments

will do - we are a small team and it takes time to implement and optimize a new model, whatever the size.
You don't even need to train the model just to see if you can infer it at the claimed speed
True, and for third-party models we'll just re-use their public open weights.

There is a time-consuming part, though, that is performed manually by our (human) team: implement the logic of the model in C++ and assembly code in a super-optimized way, co-designed for each specific hardware card.

This can take months.

We hope to accelerate the process with AI agents, but we're not there yet.

Oh