This is what I'm fiddling with. My 2080Ti is not quite enough to make it viable. I find the small models fail too often, so need larger Whisper and LLM models.
Like the 4060 Ti would have been a nice fit if it hadn't been for the narrow memory bus, which makes it slower than my 2080 Ti for LLM inference.
A more expensive card has the downside of not being cheap enough to justify idling in my server, and my gaming card is at times busy gaming.
absolutely wrong -- if you're not clever enough to think of any other reason to run an LLM locally then don't condemn the rest of the world to "well they're just using it for porno!"