|
Re the ethics part, something I haven't quite understand myself yet: On one hand, training it isn't "copying" per se, but "learning", so maybe it isn't straight up copyright infringement, unless it can reproduce large parts identically. It could also allow small team/individuals to have much large impact in the world and could lower the barrier to entry for research and experimentation, maybe even other endeavors. It certainly could help with knowledge sharing and accessibility, where downstream creativity and usefulness can outweigh diffuse individual harm. Maybe it expands the creative field rather than shrinks it, that'd be a good thing. But then on the other hand, many models (datasets) are built with copyrighted works without permission or royalties, with the effect that LLM availability could reduce demand for human livelihoods, leading to eroding fields instead of expanding them. Most releases today are kind of opaque with their training datasets, most are undisclosed and it's hard if not impossible for authors to have agency over if their work is included or not. Maybe if LLMs remain it'll be hard to sustain cultural production instead, that'd be good for no one. So then what is the best approach for someone who doesn't want the forfeit the usefulness they themselves experience, but also not go directly against what the ethical considerations bring up? In the end I don't know if there is an easy or right side to take, I guess usually the optimum sits somewhere around the middle, not at the extremes at least. |