|
|
|
|
|
by scottmf
145 days ago
|
|
I independently did the same with an MLX implementation on Sunday (also with Claude Code). I expected this C implementation to be notably faster, but my M3 Max (36GB) could barely make it past the first denoising step before OOMing (at 512x512) Am I doing something wrong? The MLX implementation takes ~1/sec per step with the same model and dimensions: https://x.com/scottinallcaps/status/2013187218718753032 |
|