|
|
|
|
|
by badFEengineer
913 days ago
|
|
This was surprisingly fast, 276.27 T/s (although Llama 2 70B is noticeably worse than GPT-4 turbo). I'm actually curious if there's good benchmarks for inference tokens per second- I imagine it's a bit different for throughput vs. single inference optimization, but curious if there's an analysis somewhere on this edit: I re-ran the same prompt on perplexity llama-2-70b and getting 59 tokens per sec there |
|