Hacker News new | ask | show | jobs
by DeathArrow 27 days ago
>This preview runs a 2B model

I guess with 1B or 500M model inference would be even faster?

1 comments

In theory yes, although not in a linearly proportional way, because in practice our memory streaming is not yet perfect. There are still some fixed costs that we did not fully optimize (for now).