Hacker News new | ask | show | jobs
by blackeyeblitzar 783 days ago
I can understand the inference part being useful and practical for Apple devs. I’m just wondering about the training part, for which there Apple silicon devices don’t seem very useful.
2 comments

My M2 Max significantly outperforms my 3090 Ti for training a Mistral-7B LoRA. Its sort of a case-by-case situation though, as it depends on how optimized the CUDA kernels happen to be for whatever workload you're doing (i.e. for inference, theres a big delta between standard transformers vs exllamav2, apple silicon may outperform the former, but certainly not the latter).
I’ve seen several people fine tune mistral 7B on MacBooks.