Hacker News new | ask | show | jobs
by terafo 1202 days ago
It fits, whisper.cpp uses 4 bit quantization, 13B model takes a little bit more than 8gb and around 9gb ram while inferencing.