Hacker News new | ask | show | jobs
by scelerat 906 days ago
For someone who is totally clueless, I can see it's faster than chat gpt in responding to the same question.

What are some relevant speed metrics? Output tokens per second? How about number of input tokens -- does that matter/how does that factor in.

1 comments

The number of input tokens is important because the bigger the context length the better. (I think our demo here is 4096 tokens of context.) But in terms of compute the important factor is how quickly you can generate the output. You want both low latency and high throughput.