Y
Hacker News
new
|
ask
|
show
|
jobs
by
memossy
843 days ago
800m is good for mobile, 8b for graphics cards.
Bigger than that is also possible, not saturated yet but need more GPUs.
2 comments
anon373839
843 days ago
Do you know how the memory demands compare to LLMs at the same number of parameters? For example, Mistral 7B quantized to 4 bits works very well on an 8GB card, though there isn’t room for long context.
link
vorticalbox
843 days ago
you ca also quantisation which lowers memory requirements at a small lose of performance.
link