Hi, I'm the person that wrote that sizing comment in the draft for this article. I have been trying for a while and have been unsuccessful at getting 405B running on any of the GPU machines. I suspect I'd need a raw 8xA100 node to do it at Q4. I doubt there is any reasonable combination of L40s cards that can do it on fly.io. It's just too big. I suspect that in time the 70b model will be brought up to be roughly equivalent, but realistically it's already on the GPT-4 threshold as is. I've found that 70b is more than sufficient in practice.