|
|
|
|
|
by tome
910 days ago
|
|
The number of input tokens is important because the bigger the context length the better. (I think our demo here is 4096 tokens of context.) But in terms of compute the important factor is how quickly you can generate the output. You want both low latency and high throughput. |
|