| Technically my wife would be a perfect customer because we literally just prototyped your solution at home. But I'm confused. For context: My wife does leadership coaching and recently used vanilla GPT-4o via ChatGPT to summarize a transcript of an hour-long conversation. Then, last weekend we thought... "Hey, let's test local LLMs for more privacy control. The open source models must be pretty good in 2025." So I installed Ollama + Open WebUI plus the models on a 128GB MacBook Pro. I am genuinely dumbfounded about the actual results we got today of comparing ChatGPT/GPT-4o vs. Llama4, Llama3.3, Llama3.2, DeepSeekR1 and Gemma. In short: Compared to our reference GPT-4o output, none (as in NONE, zero, zilch, nil) of the above-mentioned open source models were able to create even a basic summary based on the exact same prompt + text. The open source summaries were offensively bad. It felt like reading the most bland, generic and idiotic SEO slop I've read since I last used Google. None of the obvious topics were part of the summary. Just blah. I tested this with 5 models to boot! I'm not an OpenAI fan per se, but if this is truly OS/SOTA then, we shouldn't even mention Llama4 or the others in the same breath as the newer OpenAI models. What do you think? |
Please shoot me an email at tanya@tinfoil.sh, would love to work through your use cases.