Y
Hacker News
new
|
ask
|
show
|
jobs
by
c1sc0
752 days ago
They don’t need to if they can process a significant chunk of the queries on-device. Llama3-level inference works fine on M2-level chips today and the M4 is already in a mobile device.