|
|
|
|
|
by danielmarkbruce
514 days ago
|
|
> You seem to think that predicting s t -> s t is easier than predicting st (single token) -> s t. Yes, it is significantly easier to train a model to do the first than the second across any real vocabulary. If you don't understand why, maybe go back to basics. |
|
And ...
1) If the training data isn't there, it still won't learn it
2) Having to learn that the predictive signal is a multi-token pattern (s t) vs a single token one (st) isn't making things any simpler for the model.
Clearly you've decided to go based on personal belief rather that actually testing for yourself, so the conversation is rather pointless.