| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marcinzm 941 days ago
	Bottleneck for larger models however this would presumably allow for cheaper models at scale or on compute constrained devices (like phones).

1 comments

entropicdrifter 940 days ago

And potentially for distributing a model across several devices at inference time. You could devote a cluster of smaller/weaker machines to inference.

link

sroussey 940 days ago

You can do that today, the only advantage today though is being able to fix the model in memory. It’s sequential and slower due to communication costs, though batching might be faster?

link