|
|
|
|
|
by kuprel
851 days ago
|
|
This adds PyTorch/CUDA training support to Andrej Karpathy's minbpe. It takes 2min 28sec (148 seconds) on an RTX4090 to train the BasicTokenizer with a vocab_size of 512 on 307MB of Enron emails. The original code takes 2hrs 15min (8076 seconds) on an M2 Air with Python 3.11 to do this. That is a 55x speedup. |
|
If so this doesn’t seem like a logical comparison and the 55x claim would likely not translate when using the same hardware.