|
|
|
|
|
by nl
1060 days ago
|
|
That is a fair question, and in addition I'm unsure that a simple metric like perplexity is likely to pick it up. However, I do think that if perplexity showed a lower drop-off using this modified softmax under quantization that would be an exciting finding and enough to indicate further experiments would definitely be worth doing. But you are right - if it doesn't show an improvement it doesn't necessarily rule out that it could be helping. Edit: In the Qualcomm AI paper mentioned in this post, they experiment on BERT uncased (109B param) and OPT 125M and are able to show the effects using perplexity. I hadn't read the paper when I suggested the same approach, so I guess that is good validation it is worth trying. Edit2: Actually they also test on ViT 22M, which would be even quicker to try I think. |
|