|
|
|
|
|
by zozbot234
2 hours ago
|
|
Maybe we shouldn't be running these models on laptops with their thermally constrained form factor, and we shouldn't expect quick inference on a par with a large cloud-based platform either, at least not for near-SOTA model quality. It's still worth it to avoid becoming massively reliant on centralized services. |
|
This setup is extremely optimized down to the last flag. Changing any param above the temp flag craters performance.
I don't have enough system RAM to properly handle the large context windows so I don't use local models.