Hacker News new | ask | show | jobs
by pilotneko 804 days ago
I experimented with this model and vLLM around a month ago. The long context length is attractive, but it was incredibly slow on a g5.12xlarge (4 NVIDIA A10G GPUs). I actually could not get it to respond for single examples longer than 50K tokens.