Hacker News new | ask | show | jobs
by haeggee 822 days ago
very cool work! we did something similar in the context of the Swiss AI initiative (https://www.swiss-ai.org/) here: https://github.com/swiss-ai/MoE. The implementation is as simple and fast as nanoGPT and works with our modular llm-baselines codebase (https://github.com/epfml/llm-baselines) for experimenting with transformers and different datasets :)
1 comments

This is awesome! Thanks for sharing. I'll definitely check this out.