|
|
|
|
|
by two_in_one
871 days ago
|
|
From the post: > I implemented imperative code that does what I’m proposing the transformer is doing. It produces outputs very similar to the transformer. This means there is probably a way to bypass transformers and get the same results. Would be interesting if it's more efficient. Like given foundation model train something else and run it on much smaller device. |
|