| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by buildbot 783 days ago
	Does this support training on Apple silicon? It’s not very clear unless I missed something in the README.

2 comments

blackeyeblitzar 783 days ago

Would such a capability (training) be useful for anything other than small scale experimentation? Apple doesn’t make server products anymore and even when they did, they were overpriced. Unless they have private Apple silicon based servers for their own training needs?

link

jjtheblunt 783 days ago

Isn’t the current Mac Pro available in rack mount form?

https://www.apple.com/mac-pro/

link

donavanm 783 days ago

> Unless they have private Apple silicon based servers for their own training needs?

Id be SHOCKED if so. Its been 15 years, but I was there when xserve died. Priorities were iphone > other mobile devices >>> laptops > displays & desktops >>> literally anything else. When xserve died we still needed osx for OD & similar. Teams moved on to 3P rack mount trays of mac minis as a stop gap. Any internal support/preference for server style hardware was a lolwut response. Externally I see no reason to suspect thats changed.

link

MBCook 783 days ago

There are an insane number of Apple Silicon devices out there.

If your product runs on an iPhone or iPad, I’m sure this is great.

If you only ever want to run on 4090s or other server stuff, yeah this probably isn’t that interesting.

Maybe it’s a good design for the tools or something, I have no experience to know. Maybe someone else can build off it.

But it makes sense Apple is releasing tools to make stuff that works better on Apple platforms.

link

blackeyeblitzar 783 days ago

I can understand the inference part being useful and practical for Apple devs. I’m just wondering about the training part, for which there Apple silicon devices don’t seem very useful.

link

spmurrayzzz 782 days ago

My M2 Max significantly outperforms my 3090 Ti for training a Mistral-7B LoRA. Its sort of a case-by-case situation though, as it depends on how optimized the CUDA kernels happen to be for whatever workload you're doing (i.e. for inference, theres a big delta between standard transformers vs exllamav2, apple silicon may outperform the former, but certainly not the latter).

link

rgbrgb 783 days ago

I’ve seen several people fine tune mistral 7B on MacBooks.

link

zmk5 783 days ago

I believe the MLX examples allow for it. Seems like a general purpose framework rather than a Mac specific one.

link

gbickford 783 days ago

I couldn't find any training code in the MXL examples.

link