| HN Mirror

It is a reasonable bet to assume that openAI had some powerful models along the lines of Chris Re’s group well before mamba came out. They hired people with the right background and certainly the cuda optimization would not be a problem for OpenAI. My main question is if it makes sense to scale up an already huge model 32x during training compared to other ideas for increasing capacity at scale.