Hacker News new | ask | show | jobs
by LelouBil 1217 days ago
> Amazon's CodeWhisperer has a "reference tracker" that tells you the license of training data code if the generated response is within some similarity threshold, but that's still not good enough imo.

I don't think it's possible to do better than that with this technology.

1 comments

like I probably don't understand this in the right way, but I could have sworn we had the ability to probe latent space on models like these and make mappings based on them? Or was that only for diffusers?
Sure, you could build an index of attribution <> latent space coords, but it would not be clear whether a generated document near several index entries would require compliance.

I guess this is where the threshold comes from. Choose a generous margin and over-attribute rather than under-attribute.