|
|
|
|
|
by marci
3 hours ago
|
|
But with Apple's AFM 3 architecture, we might end up with huge SOTA adjacent on devices with limited RAM. They use a technique where you only load between 1B and 4B of a 20B dense model for an entire prompt run, not token by token like a MoE, and use mostly the low power ANE instead of GPU cores. Now, imagine if/when they scale up to 100B or more? On a chip using 2W? |
|
If someone could splinter or fragment the models into more specific tasks i.e "spellchecker AI" and get these working as well as Sonnet 4.6-4.8 on those tasks on a personal laptop. You then question the $100 a month fee.
Bear in mind these laptops are likely to be $5000 or so because of the memory, HDD and M7 chip they likely need.
It feels to me like the beginning of the inflection point but software updates not hardware updates will be the accelerant.