Or if you want multiple sessions at the same time. Or if you want to do anything else with your machine while it's running.
But realistically, 5 minutes is too long. It should be conversational, and for that you need at least 5 tokens per second. Which your Ryzen just can't do.
This particular thread we are commenting on is about Dolphin Mixtral, which is mostly used for offline code completion (à là Microsoft GitHub Copilot). You don’t want to have to wait 5 minutes at every keystroke to get code suggestions.
In my experience, it takes some experimentation to figure out a good prompt. I don’t think I would have gotten very far off I had to wait that long for each result.
> GPUs are only needed if you can not wait 5 minutes for an answer
Yeah, but that's generally true (or at least, “5 minutes for an answer is very suboptimal”, even if “can’t” isn’t quite true) for interactive use cases, which are... a lot of LLM use cases.
Not sure why you're getting downvoted. It performs decent enough on my Ryzen 3600X with 64GB of RAM. It definitely wouldn't be usable for production or fine-tuning, but it's fine for experimenting.
But realistically, 5 minutes is too long. It should be conversational, and for that you need at least 5 tokens per second. Which your Ryzen just can't do.