Hacker News new | ask | show | jobs
by brucethemoose2 846 days ago
Yi 34 200K finetunes (like Tess 1.5),Deepseek Code 33B and Miqu 70B definitely outpace ChatGPT-3.5, at least for me.

They don't have the augmentations of being a service, but generally they are smarter, have a bigger context and (perhaps most importantly) are truly unbound.

I am on a single 3090 desktop, for reference. Admittedly, this is much more expensive now than it was a few months ago, with the insane prices used 3090s are going for now.

1 comments

Damn, I see, how many tokens per sec you get on that setup?

On a Macbook M2 I get ~10/12t/sec which is a tiny tad bit too slow for continued/ daily use, but if I think its worthy I might invest on a more powerful machine soon-ish!

On 33B/34B models I get 35 tokens/sec, way faster than I can read streaming in. At huge contexts (like 30K-74K), prompt processing takes forever and token generation is slower, but its still faster than I can read.

Miqu 70B is slow (less than 10 tok/sec, I think) because I have to split it with llama.cpp. I only use it for short context questions where I need a bit more intelligence.

And for reference, this is a SFF desktop! It's no Macbook, but still small enough (10L and flat) for me to fly with in carry on.