| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by logotype 70 days ago
	There’s a lot of wasted compute with stateless inference. We set out to solve that with a new computational model for transformers, and only process the delta between requests. That’s how we can achieve crazy low latency tool calling with LayerScale. Check it out https://layerscale.ai and technical whitepapers out next month!