Hacker News new | ask | show | jobs
by InvidFlower 459 days ago
While that is cool in principal, I'm not sure how well it'd actually work in reality. First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?

Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.

1 comments

> First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?

Well, Bing AI already knows where it drew the information from and cites sources; so it would be a matter of making the deal.

How to enforce it? that's the main question I reckon.

> Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses.

I agree.