Y
Hacker News
new
|
ask
|
show
|
jobs
by
low_tech_punk
538 days ago
Thanks! The 0.1B version looks perfect for embedded system. What is the key benefit of attention-free architecture?
1 comments
pico_creator
537 days ago
lower compute cost especially over longer sequence length. Depending on context length, its 10x, 100x, or even 1000x+ cheaper. (quadratic vs linear cost difference)
link