|
|
|
|
|
by heisenburgzero
900 days ago
|
|
Not completely related. Does anyone know where I can find articles / papers that discuss why transformers, while acting as merely "next token predictor" can handle questions with:
1. Unknown words (or subwords/tokens) that are not seen in the training dataset.
Example: Create a table with "sdsfs_ff", "fsdf_value" as columns in pandas.
2. Create examples(unseen in training dataset) and tell the LLM to provide similar output. I have a feeling it should be a common question, but I just can't find the keyword to search. PS. If anyone has any links with thoroughly discussion about positional embedding, that would be great. I never got a satisfying answer about the usage of sine / cosine and (multiplication vs addition) |
|