| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scelerat 906 days ago
	For someone who is totally clueless, I can see it's faster than chat gpt in responding to the same question. What are some relevant speed metrics? Output tokens per second? How about number of input tokens -- does that matter/how does that factor in.

1 comments

tome 906 days ago

The number of input tokens is important because the bigger the context length the better. (I think our demo here is 4096 tokens of context.) But in terms of compute the important factor is how quickly you can generate the output. You want both low latency and high throughput.

link