|
|
|
|
|
by thomasahle
763 days ago
|
|
Maybe you could just use a good-old 1D-CNN for the bottom 3-4 layers. Then the model has been able to combine characters into roughly token length chunks anyway. Just make sure to have some big MLPs at the start too, to enrich the "tokens" with the information currently stored in the embedding tables. |
|