Hacker News new | ask | show | jobs
by cloudking 911 days ago
Wonderful hack, the overall response latency is the only thing that hurts the UX, if you can get the response time down would be epic. Nice work.
1 comments

Thanks! There are ways to shave off the latency: hosting locally, using quantized/smaller models, streaming data instead of doing the tasks sequentially