|
|
|
|
|
by herpderperator
25 days ago
|
|
The visualiser seems to be quite naive with what it defines as a token. I don't think a token is an entire word as often as the demo shows, and when it gets to the `def estimate_tokens` method, the entire `# Rough heuristic: ~1 token per 4 chars of English` comment is printed all at once as one token, which is certainly not accurate. This is not a realistic replay of what a common LLM might actually print out - it's entirely fabricated. But for the purpose of estimating the feel of tokens per second, I suppose it's good enough. |
|
[1] https://dave.ly/tools/tokenflow/
[2] https://platform.openai.com/tokenizer