|
|
|
|
|
by daemonologist
441 days ago
|
|
To some extent the "mystery" (and temporary free-as-in-beer-ness) of this model might be getting to me, but I think it's pretty interesting. Given the token throughput (250B this week) it's obvious there's a pretty major player behind the model, but why is it stealthed? Maybe there's something about the architecture or training that would put people off if it was public right off the bat? Maybe they're purely collecting usage/acceptance data and want unbiased users? On the Aider Polyglot leaderboard it's ~middle of the leading pack, comparable to DeepSeek V3 and 3.5 Sonnet. I ran NoLi(teral)Ma(tching), an unsaturated long-context benchmark, on it and was impressed though: = Model =========== Base Score = 8K Context = 16K Context =
Quasar Alpha: >=97.8% 89.2% 85.1%
GPT-4o: 99.3% 89.2% 81.6%
Llama 3.3 70B: 97.3% 72.1% 59.5%
Gemini 1.5 Pro: 92.6% 63.9% 55.5%
Claude 3.5 Sonnet: 87.6% 61.7% 45.7%
Gemini 1.5 Flash: 84.7% 44.4% 35.5%
GPT-4o mini: 84.9% 32.6% 20.6%
Llama 3.1 8B: 76.7% 31.9% 22.6%
It also performs well - slightly better than GPT-o1 - on the "hard" subset at 16K context with 62.8%. Latency is quite good as well.More details: https://old.reddit.com/r/LocalLLaMA/comments/1ju1czn/quasar_... |
|