Hacker News new | ask | show | jobs
by ttt3ts 903 days ago
You have to pass the context between GPUs for large models that don't fit in VRAM. Often ends up slower. Also, tooling around AMD GPUs is still poor in comparison.