Hacker News new | ask | show | jobs
by DavidSJ 983 days ago
Neither does GPT-4 or other sparse mixtures of experts, such as e.g. switch transformers [1].

[1] https://arxiv.org/abs/2101.03961