|
|
|
|
|
by simonw
53 days ago
|
|
There's a bunch of useful information in my comment that's independent of the fact that it drew a pelican: 1. You can run this on a Mac using llama-server and a 17GB downloaded file 2. That version does indeed produce output (for one specific task) that's of a good enough quality to be worth spending more time checking out this model 3. It generated 4,444 tokens in 2min 53s, which is 25.57 tokens/s |
|
* er, that probably sounds strange, but I did just spend 6 weeks working on integrating the Willison Trifecta for my app I've been building for 2.5 years, and I considered it a release blocker. It's a simple mental model that is a significant UX accomplishment IMHO.