|
|
|
|
|
by joelthelion
947 days ago
|
|
> My prediction would be that this will develop in the way that we can soon buy $1 hardware accelerators for things like word embedding, grammar, and general language understanding. And then you need those expensive GPUs only for the last few layers of your LLM, thereby massively reducing deployment costs. You'd still need a lot of RAM for storing these weights, wouldn't you? I mean, obviously, a $1 accelerator is a great improvement of x,000$ GPUs, but it doesn't mean we all get LLMs working on our phone just yet. |
|
EDIT: And given that this work is centered around energy-efficiency and was sponsored by Huawei, I would guess that LLMs on your phone are precisely the goal here.
EDIT2: The process node that they did their calculations with appears to match Google's TPUv3 which has 0.56 TOPS/W and the paper claims 161 TOPS/W which would be a 280x improvement in energy efficiency over the AI chips in Pixel phones.