Hacker News new | ask | show | jobs
by Alyx1337 916 days ago
Thanks! There are ways to shave off the latency: hosting locally, using quantized/smaller models, streaming data instead of doing the tasks sequentially