Hacker News new | ask | show | jobs
by snowfield 876 days ago
Or if you want multiple sessions at the same time. Or if you want to do anything else with your machine while it's running.

But realistically, 5 minutes is too long. It should be conversational, and for that you need at least 5 tokens per second. Which your Ryzen just can't do.

1 comments

>It should be conversational, and for that you need at least 5 tokens per second.

To be fair, a lot of people are using this for non-interactive work, like batching document analysis or offline processing of user generated content.

This particular thread we are commenting on is about Dolphin Mixtral, which is mostly used for offline code completion (à là Microsoft GitHub Copilot). You don’t want to have to wait 5 minutes at every keystroke to get code suggestions.
In my experience, it takes some experimentation to figure out a good prompt. I don’t think I would have gotten very far off I had to wait that long for each result.