Hacker News new | ask | show | jobs
by IanCal 1206 days ago
You have to split it up which slows it down a lot. The 14B model doesn't fit fully on a 3090, though the 7B fits easily and is very fast. Other replies either may have meant this or thought the original comment was about llama.