Hacker News new | ask | show | jobs
by irodov_rg 2712 days ago
Its mostly the lookup table which takes up the most space. This work is about breaking it into 2 layers and continuing to train to gain accuracy. The output model becomes 90% smaller compared to the original model.