|
|
|
|
|
by hackerlight
843 days ago
|
|
> this would likely require an additional tensor channel, so that each "feature" carries information in a high enough dimensional space Suppose input data is [batch_size, num_features]. Then you do x.unsqueeze(1) giving you [batch_size, num_features, 1]. Then what? |
|
einsum('bf,fc->bfc', batched_inputs, channel_embedding)
Then carry that info through the network and project it down at the end. It's roughly equivalent to the token embedding step in an LLM.