Hacker News new | ask | show | jobs
by gorbypark 1199 days ago
That's interesting that the fidelity seems to change. I just realized I had been running with `-t 8` even though I only have a M2 MacBook Air (4 perf, 4 efficiency cores) and running with `-t 4` speeds up 13B significantly. It's now doing ~160ms per token versus ~300ms per token with the 8 cores settings. It's hard to quantify exactly if it's changing the output quality much, but I might do a subjective test with 5 or 10 runs on the same prompt and see how often it's factual versus "nonsense".