Hacker News new | ask | show | jobs
by MPSimmons 876 days ago
>It should be conversational, and for that you need at least 5 tokens per second.

To be fair, a lot of people are using this for non-interactive work, like batching document analysis or offline processing of user generated content.

2 comments

This particular thread we are commenting on is about Dolphin Mixtral, which is mostly used for offline code completion (à là Microsoft GitHub Copilot). You don’t want to have to wait 5 minutes at every keystroke to get code suggestions.
In my experience, it takes some experimentation to figure out a good prompt. I don’t think I would have gotten very far off I had to wait that long for each result.