The GPU version of libNC is available as a free binary, and you can find MIT-licensed source code implementing (training and inference of) transformers using libNC at https://bellard.org/nncp. (nncp was meant to be a submission to the Hutter Prize, which would have required open-sourcing libNC too, but it didn't qualify due to using AVX2 and too much RAM. At least the CPU version binary is MIT licensed since ts_server is.) I think it wouldn't be that big a project to support LLaMa starting from that code, although it is dense code.
Countering SaaS-ifying by others can also be achieved through the AGPL or through the BSL (initially not open-source, reverts to open source after set period).
I do believe one of the failures of GPL was not being AGPL from the start.
You're right : `The CPU version is released as binary code under the MIT license`. That's quite an unusual choice, not even sure how MIT would apply to that...
You can release whatever you want under MIT :) It grants you the right to use the binary for commercial projects (or any type of projects you want), to modify the binary, distribute it yourself and more. You cannot hold Bellard/the license holder liable for anything related to it, and you must include the license and copyright if you distribute it.
Edit: licensing