|
|
|
|
|
by imustachyou
1163 days ago
|
|
S4 and its class of state-space models are an impressive mathematical and signal-processing innovation, and I thought it was awesome how they destroyed previous baselines for long-range tasks. Have there been any state-space models adapted for arbitrary text generation? Language models like ChatGPT are trained to predict new words based on the previous ones and are excellent for generation, a harder task than translation or classification. I'm doubtful about the adaptability of text models that deal with fixed-sized input/outputs and don't have an architecture that is as natural for generating indefinitely long sequences. |
|