|
|
|
|
|
by ThunderSizzle
54 days ago
|
|
Thanks. That is what I suspected. The 3090's in my area seem pretty expensive for a several year old second hand card - they are the same price as a new 5080. 5090 is pretty expensive (~$4000) to justify it over a $10-50 sub. I guess the nice thing is the api side becomes "included", if I ever want to go that route. But if I have a GHCP $40 sub vs a $4000 GC to match it, just on hardware, pay off is at 8 years. If I add in electricity, pay off is probably never. Sure, the sub can go up in price, but the value proposition for self-running doesn't seem to make sense - especially if I can't at least match Sonnet on GHCP or something like that. I hope to self-run some not useless LLMs/Agents at some point, but I think this market needs to stabalize first. I just don't like waiting. |
|
As for models, I'm really genuinely impressed with Gemma4 26B A4B and Qwen3.6 35B A3B right now. Between them, I've seen solid image analysis, good medium-image OCR on very tough images, very good understanding of short stories, good structured data extraction from documents, extremely good language translation, etc. If you wanted to build a custom tool which summarized your inbox/RSS feeds/local news every day, or extracted information from emails and entered it into a database, or automatically captioned images, those tasks are all viable locally. The quality of the results is up dramatically in the last 12 months. At this point, my old personal non-agentic LLM benchmarks are "saturated": All the current leading models score extremely well on literally anything I was asking last year.
It's the true agentic coding workflows where the big models really stand out. And those models are all large enough that the hardware needs to amortized over enough users to run 24 hours/day.