|
|
|
|
|
by brandonb
265 days ago
|
|
Can you explain how? If I'm understanding the paper right, the timeseries encoding is a Conv1D and the cross-attention layer is constrained to output the token space of a pre-trained LLM. My naive expectation is these constraints would make the model less expressive / fine-tunable to pick up on these types of subtle signals. But obviously ML is an empirical field, so if you found that a constrained architecture worked well in practice, that's an interesting result in its own right. |
|