Y
Hacker News
new
|
ask
|
show
|
jobs
by
atemerev
633 days ago
IDK, 8B-class quantized models run pretty fast on commodity laptops, with CPU-only inference. Thanks to the people who figured out quantization and reimplemented everything in C++, instead of academic-grade Python.
1 comments
actualwitch
633 days ago
A solid chunk of python is just wrappers around C/C++, most tensor frameworks included.
link
atemerev
633 days ago
I know, and yet early model implementations were quite unoptimized compared to the modern ones.
link