Hacker News new | ask | show | jobs
by aukejw 407 days ago
How do you determine peak memory usage? Just look at activity monitor?

I've yet to find a good overview of how much memory each model needs for different context lengths (other than back of the envelope #weights * bits). LM Studio warns you if a model will likely not fit, but it's not very exact.

1 comments

MLX reports peak memory usage at the end of the response. Otherwise I'll use Activity Monitor.
I'm also trusting `get_peak_memory` + some small buffer for now.

Still, it reports accurate peak memory usage for tensors living on GPU, but seems to miss some of the non-Metal overhead, however small (https://github.com/aukejw/mlx_transformers_benchmark/issues/...).