|
|
|
|
|
by gajjanag
502 days ago
|
|
This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers. Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device. |
|