Hacker News new | ask | show | jobs
by avereveard 1174 days ago
The methodology for alpaca has proven powerful and it's being applied to model with better licensing. It's hard to track lineage, but I think openassistant models are the most permissive at the moment, they use a openly sourced set of data to build an instruct model on top of phiia, which itself is a gptneox trained on a duplicated version of the famous the pile dataset.

The problem is verifying the licensing claims for these composed solutions is becoming exceedingly hard.

1 comments

Almost everything in AI now breaks Americas copyright principles.