|
|
|
|
|
by CasperDern
1600 days ago
|
|
They used a fixed size transformer, where the vocab determines the functions and input/output range. So unless the model needs more 'memory' for your class of expression there wouldn't necessarily be a big change in performance. They have experiments in the paper with bigger/smaller vocabs. |
|