| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brandonb 265 days ago
	Can you explain how? If I'm understanding the paper right, the timeseries encoding is a Conv1D and the cross-attention layer is constrained to output the token space of a pre-trained LLM. My naive expectation is these constraints would make the model less expressive / fine-tunable to pick up on these types of subtle signals. But obviously ML is an empirical field, so if you found that a constrained architecture worked well in practice, that's an interesting result in its own right.

1 comments

RealLast 265 days ago

Sure! There is more after the 1D conv, another transformer architecture that encodes further features of the time series. The LLM can then basically query this encoder for information, also able to capture more subtle patterns. In away it's similiar to how some vision language models work.

link