|
|
|
|
|
by russdill
109 days ago
|
|
I've experimented with having different sized LLMs cooperating. The smaller LLM starts a response while the larger LLM is starting. It's fed the initial response so it can continue it. The idea of having an LLM follow and continuously predict the speaker. It would allow a response to be continually generated. If the prediction is correct, the response can be started with zero latency. |
|
(Meanwhile at OpenAI: testing out the free ChatGPT, it feels like they prompted GPT 3.5 to write at length based on the last one or maybe two prompts)