> to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes
I wouldn't call these terms permissive. It's in line with the recent trend in released AI models, but fairly restrictive in what you're actually allowed to do with it.
The Instruct model has that non-commercial restriction, but I'm not sure why. They say it was trained with Alpaca-formatted questions and responses, but I'm not sure if that includes the original Alpaca dataset.
I believe that is more related to how the default Huggingface inference UI is prompting. Running locally with the correct prompt template it gives default completes, eg
I guess you could come up with a thousand example prompts and pay some students to pick which output is better, but I can also see why you wouldn't bother. It probably depends on language, type of prompt, etc.
Sure it's easy -- you can use benchmarks like HumanEval, which Stability did. They just didn't compare to Codex or GPT-4. Of course such benchmarks don't capture all aspects of an LLM's capabilities, but they're a lot better than nothing!
One could team up with Hackerrank/leetcode, let the model code in the interface (maybe there's an API for that already, no idea), execute their code verbatim and see how many test cases they get right the first time around. Then, like for humans, give them a clue about one of the tests no passing (or code not working, too slow, etc.). Give points based on the difficulty of the question and the number of clues needed.
I guess the obvious caveat is that these model are probably overfitted on these types of questions. But a specific benchmark could be made containing question kept secret for models. Time to build "Botrank" I guess.
> On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.
Can you put those numbers into context for those who haven't done HumanEval? Are those percentages so that 40+ means 40+% and 26 is 26%? If so does that imply both would be failing scores?
https://huggingface.co/stabilityai/stablecode-instruct-alpha...
You can “run it locally”. Very handy if you do not trust automatically sending all your code to someone in the United States.