| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

1 comments

not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.

OK, B70.