|
|
|
|
|
by isaacimagine
361 days ago
|
|
No mention of Decision Transformers or Trajectory Transformers? Both are offline approaches that tend to do very well at long-horizon tasks, as they bypass the credit assignment problem by virtue of having an attention mechanism. Most RL researchers consider these approaches not to be "real RL", as they can't assign credit outside the context window, and therefore can't learn infinite-horizon tasks. With 1m+ context windows, perhaps this is less of an issue in practice? Curious to hear thoughts. DT: https://arxiv.org/abs/2106.01345 TT: https://arxiv.org/abs/2106.02039 |
|
The hardness of the credit assignment problem is a statement about data sparsity. Architecture choices do not "bypass" it.