Hacker News new | ask | show | jobs
by CasperDern 1600 days ago
They used a fixed size transformer, where the vocab determines the functions and input/output range. So unless the model needs more 'memory' for your class of expression there wouldn't necessarily be a big change in performance. They have experiments in the paper with bigger/smaller vocabs.