|
|
|
|
|
by y2244
387 days ago
|
|
Pick up a used 3090 with more ram. It holds it's value so you won't lose much if anything when you resell it. But otherwise, as said, install Ollama and/or Llama.cpp and run the model using the --verbose flag. This will print out the token per second result after each promt is returned. Then find the best model that gives you a token per second speed you are happy with. And as also said, 'abliterated' models are less censored versions of normal ones. |
|