|
|
|
|
|
by crazygringo
308 days ago
|
|
How does a prompt this long affect resource usage? Does inference need to process this whole thing from scratch at the start of every chat? Or is there some way to cache the state of the LLM after processing this prompt, before the first user token is received, and every request starts from this cached state? |
|