|
|
|
|
|
by dgreensp
1035 days ago
|
|
My understanding (not an expert) is that the time for a LLM to produce an output is linear in the length of the output, but may not be in the length of the input (i.e. context). It may be quadratic in the context, or using some kind of fancy attention optimization. |
|