|
|
|
|
|
by gkucsko
1149 days ago
|
|
thanks, the model itself is a pretty vanilla gpt model based heavily on karpathy's nanogpt, so should not need too many bells and whistles to get it running on specific architectures. that said i have very little experience with platform specific development, so would looove some help from the community :) |
|
PyTorch themselves used nanoGPT training as demo for this: https://pytorch.org/blog/accelerating-large-language-models/