Y
Hacker News
new
|
ask
|
show
|
jobs
by
DeathArrow
27 days ago
>This preview runs a 2B model
I guess with 1B or 500M model inference would be even faster?
1 comments
gaeld
26 days ago
In theory yes, although not in a linearly proportional way, because in practice our memory streaming is not yet perfect. There are still some fixed costs that we did not fully optimize (for now).
link