|
|
|
|
|
by avisoori1x
827 days ago
|
|
A from scratch implementation of a sparse mixture of experts language model in a single file of PyTorch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows a number of re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture. I added Expert Capacity to this implementation to make it more complete |
|