Hacker News new | ask | show | jobs
by mathiaspoint 297 days ago
Why would you throw out the original embedding layer? That seems like a step backwards to me. It's likely it was partly trained on Yiddish and without it you're throwing out a lot of information in the rest of the model.