| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tome 910 days ago
	The number of input tokens is important because the bigger the context length the better. (I think our demo here is 4096 tokens of context.) But in terms of compute the important factor is how quickly you can generate the output. You want both low latency and high throughput.