|
|
|
|
|
by bionhoward
626 days ago
|
|
Yes, I believe this is possible, you could clone weights of one or more existing models and fine tune them in groups with different random seeds for noise/drop to produce reasonable outputs under a differential transformer decoding scheme whereby tokens with disagreement receive more attention (surprisal analysis) |
|