Hacker News new | ask | show | jobs
by SCHiM 934 days ago
That is actually possible. For example, someone wrote python code to do this for the massive open source model BLOOM.

However, it's still slow as tar. When I was running the BLOOM model I think my inference time was 1 token / m.

See: https://towardsdatascience.com/run-bloom-the-largest-open-ac...