|
|
|
|
|
by vkaku
328 days ago
|
|
This was entirely written by Grok 3.0. The focus was on being able to demonstrate training, inference and attention, all in one file; This can be run on a GPU thanks to cupy, a kernel needn't be written for this whole thing to run. I definitely think that more people can mess around with different attention mechanisms and models and try training models out on their computers. That is the post. |
|