|
|
|
|
|
by sporkl
121 days ago
|
|
The framework used in the book, malt[0], is currently not GPU-accelerated, but it's being worked on. Maybe interesting, I used it for a toy implementation of the GPT architecture[1] in about 500 lines. (I studied with one of the authors, Dr. Daniel Friedman; wasn't super involved here but proofread a late draft and TA'd for a course based off the book.) [0]: https://github.com/themetaschemer/malt [1]: https://github.com/sporkl/malt-transformer |
|