Hacker News new | ask | show | jobs
by selfhoster11 1173 days ago
Try quantised models. They perform reasonably well, although you probably want to run some benchmarks if you really want to get it done properly.