Y
Hacker News
new
|
ask
|
show
|
jobs
by
redlock
505 days ago
The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)