Hacker News new | ask | show | jobs
by dev_tools_lab 82 days ago
Thanks for this project. Prioritizing MoE models and adding an intelligent NVMe cache could improve efficiency, especially on the M4 Max where bandwidth makes usage more realistic.