Hacker News new | ask | show | jobs
by biophysboy 368 days ago
Could you not cache the top k outputs given a provided input token set? I thought the randomness was applied at the end by sampling the output distribution.