Hacker News new | ask | show | jobs
by logotype 70 days ago
There’s a lot of wasted compute with stateless inference. We set out to solve that with a new computational model for transformers, and only process the delta between requests. That’s how we can achieve crazy low latency tool calling with LayerScale. Check it out https://layerscale.ai and technical whitepapers out next month!