Hacker News new | ask | show | jobs
by cyanydeez 7 days ago
not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.
1 comments

OK, B70.