| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ryan_glass 380 days ago
	You are right that I haven't been rigorous - it's easy to benchmark tokens/second but quality of output is more difficult to nail down. I couldn't find any decent comparisons for Unsloth either. So I just tried a few of their models out, looking for something that was 'good enough' i.e. does all I need: coding, summarizing documents, troubleshooting anything and everything. I would like to see head to head comparisons too - maybe I will invest in more RAM at some stage but so far I have no need for it. I ran some comparisons between the smaller and larger versions of the Unsloth models and interestingly (for me anyway) didn't notice a huge amount of difference in quality between them. But, the smaller models didn't run significantly faster so I settled for the biggest model I could fit in RAM with a decent context. For more complex coding I use Deepseek R1 (again the Unsloth) but since it's a reasoning model it isn't real-time so no use as my daily driver.

2 comments

danielhanchen 380 days ago

Thanks for using our quants and appreciate it :) - We're still doing internal benchmarks since they're very slow to do - but they definitely pass our internal benchmarks :)

link

ryan_glass 379 days ago

Thank you for making the dynamic quantisations! My setup wouldn't be possible without them and for my personal use, they do exactly what I need and are indeed excellent.

link

ysosirius 379 days ago

How do you find the quality of the output compares to that of, say, o3 or Sonnet 4?

link

ryan_glass 379 days ago

To be honest I haven't used o3 or Sonnet as the code I work with is my own proprietary code which I like to keep private, which is one reason for the local setup. For troubleshooting day to day things I have found it at least as good as than the free in-browser version of ChatGPT (not sure which model it uses).

link