Seems like llama derived model are flourishing. However with llama is licensed as academic only and noncommercial model, what is the path for bringing this to production of for profit purpose?
The methodology for alpaca has proven powerful and it's being applied to model with better licensing. It's hard to track lineage, but I think openassistant models are the most permissive at the moment, they use a openly sourced set of data to build an instruct model on top of phiia, which itself is a gptneox trained on a duplicated version of the famous the pile dataset.
The problem is verifying the licensing claims for these composed solutions is becoming exceedingly hard.
The Silicon Valley ethos has always been - do it first worry about legality later. If you go bust - nobody will care. If you become small - you will be ignored. If you go big - lawyers will figure something out to cut a deal.
Yes, I was speaking of the general ethos, not a specific case. But let's take Uber as an example of that ethos in action -- Uber committed actual crimes as part of their growth strategy.
> llama is licensed as academic only and noncommercial model
Are weights even copyrightable? I was under the impression that they weren't (although it hasn't been tested, and there's a chance they may run afoul of database rights).
The problem is verifying the licensing claims for these composed solutions is becoming exceedingly hard.