Y
Hacker News
new
|
ask
|
show
|
jobs
by
ComplexSystems
1204 days ago
How are you getting this to run fast? I'm on a top of the line M1 MBP and getting 1 token every 8 minutes.
3 comments
ingenieroariel
1204 days ago
Try switching all the .cuda() to .mps() I got a 100x speedup on a different language model on a Macbook M1 Air.
https://pytorch.org/docs/stable/notes/mps.html
link
singularity2001
1204 days ago
dedicated fork:
https://github.com/remixer-dec/llama-mps
link
markasoftware
1204 days ago
probably pytorch is very optimized to x86. It's likely using lots of SIMD and whatnot. I'm sure it's possible to get similar performance on m1 macs, but not with the current version of pytorch.
Do you have enough ram? (not swapping to disk)?
link
jwitthuhn
1203 days ago
Same experience for me, looks like it is only using one cpu core instead of all of them.
link
https://pytorch.org/docs/stable/notes/mps.html