| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DeathArrow 27 days ago
	>This preview runs a 2B model I guess with 1B or 500M model inference would be even faster?

1 comments

gaeld 26 days ago

In theory yes, although not in a linearly proportional way, because in practice our memory streaming is not yet perfect. There are still some fixed costs that we did not fully optimize (for now).

link