|
|
|
|
|
by ThePhysicist
124 days ago
|
|
This is really cool! I am trying to find a way to accelerate LLM inference for PII detection purposes, where speed is really necessary as we want to process millions of log lines per minute, I am wondering how fast we could get e.g. llama 3.1 to run on a conventional NVIDIA card? 10k tokens per second would be fantastic but even at 1k this would be very useful. |
|
Also, "10k tokens per second would be fantastic" might not be sufficient (even remotely) if you want to "process millions of log lines per minute".
Assuming a single log line at just 100 tokens, you need (100 * 2 million / 60) ~ 3.3 million tokens per second processing speed :)