Hacker News new | ask | show | jobs
by diggan 475 days ago
Could you maybe share a lightweight benchmark where you share the exact model (+ quantization if you're using that) + runtime + used settings and how much tokens/second you're getting? Or just like a log of the entire run with the stats, if you're using something like llama.cpp, LMDesktop or ollama?

Also, would be neat if you could say what AI services you were subscribed to, there is a huge difference between paid Claude subscription and the OpenAI Pro subscription for example, both in terms of cost and the quality of responses.