Hacker News new | ask | show | jobs
by Syeposxr 969 days ago
Could you explain how the majority of your corpus is under CC BY 4.0? I realise that's the licence you have picked on HuggingFace, but if the source data was not already CC BY 4.0, how are you able to re-licence it as CC BY 4.0?
1 comments

The majority of the source data was already licensed under CC BY 4.0. Additionally, the Corpus, as a work constituting a curated and post-processed collation of other works, is also licensed under CC BY 4.0.