|
|
|
|
|
by doctoboggan
990 days ago
|
|
How do any of these sliding window techniques handle instructions that are non expected and only show up at the end? For example imagine feeding a book to the model and the last sentence being the instruction “return the count of the letter m in the previous input”. A human would handle this by first letting out an exasperated sigh but then restarting the reading while counting. An LLM has no ability to loop back and re-read the input. (Ignore LLM issues with character counting for this example). It seems like to solve this problem for real the LLM needs to be able to loop and jump arbitrarily, but I’m sure that would introduce a whole new host of issues and possibly require a new architecture all together. |
|
With the needed preprocessing, a LLM that can "go and do some research to adequately respond" could be extremely powerful.
We've spent the last ~10 millennia improving knowledge management technology to scale beyond the capacity/time of individual brains. Let the language model use actual research on this and pre-digest, not just Bing search. No need for it's short term memory to remember what say piece of code did something, just tag it when reading and rely on scalable shared indexing of tags.
Though the more I think about it, the more it sounds like normal LLM pretraining with the knowledge index being the giant chunk of LLM weights.