|
|
|
|
|
by 0xDA7A
645 days ago
|
|
I think the part I find most interesting about this is the potential power implications. Ternary models may perform better in terms of RAM and that's great, but if you manage to build a multiplication-free accelerator in silicon, you can start thinking about running things like vision models in < 0.1W of power. This could have insane implications for edge capabilities, robots with massively better swarm dynamics, smart glasses with super low latency speech to text, etc. I think the biggest technical hurdle would be simulating the non linear layers in an efficient way, but you can also solve that since you already re-train your models and could use custom activation functions that better approximate a HW efficient non linear layer. |
|