|
|
|
|
|
by nl
1066 days ago
|
|
Testing steps (based on thinking about this for 30 seconds - so probably can be improved): Train a Transformer based model with and without the modified Softmax (Suggestions: GPT-2 or nanoGPT) Measure performance - I'd probably start with Perplexity and see if there is any difference (we'd expect little difference). Quantize both models with different quantization strategies. Measure the perplexity of the quantized models of different sizes. We'd expect the performance to drop off quicker for the non-modified model than the modified one if this is working. |
|
In any case, that was an lmgtfy-level question. Here's what I found: https://til.simonwillison.net/llms/training-nanogpt-on-my-bl...
I shall try that soon.