| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rented_mule 1799 days ago

Also not a lawyer, but I've been around ML for a while. The question makes perfect sense to me!

It takes some amount of time to comply with a takedown notice. For example, time passes between receiving Alice's notice and taking down Bob's repo.

I would expect Copilot's model(s) to be retrained periodically in order to remain relevant. The next retraining could exclude Alice's code. That might be a longer window than the case of the repo takedown, but as long as it doesn't take too long they might be okay?

There are incremental training approaches that evolve models over time rather than completely retraining them. In my experience, complete retraining is a far more common approach because the highly path dependent nature of incremental training can lead to outcomes that are hard to manage. For example, what if you discover bad training data like repos that collect anti-patterns? Or Alice's takedown notice? You typically want your models to be able to "unsee" things and that's hard with purely incremental training. Even when incremental approaches are used, there is often an occasional complete retraining to overcome such issues.

To be clear, I have no idea what training approach is used for Copilot.

1 comments

bigwavedave 1798 days ago

That makes plenty of sense, thank you for the explanation!

link