Hacker News new | ask | show | jobs
by ummonk 65 days ago
I don't see why the transformer architecture can't be designed and trained with separate inputs for control data and content data.
2 comments

Give it a shot
because it's all one (unexplainable) matrix of weights.