Hacker News new | ask | show | jobs
by pzo 357 days ago
but this is still great trick if you want to reduce latency or inference speed even with local models e.g. in realtime chatbot