Hacker News new | ask | show | jobs
by alkonaut 1809 days ago
This is a good point. There is a lot of outrage now, but the product when finished might have every single wrinkle removed.

This one, for example, seems it should be pretty easy to fix. You could even make a hack that replaces ALL sufficiently long and sufficiently random strings with garbage/zeroes, at the point of recall. The difference from the case of regurgitating GPL sources is that the information that it looks like an API key can be deducedd from the output of copilot, so you don't need to track it through the system like you would with a system of attribution.

1 comments

How do you tell a “long and random string” from a base64 encoded PNG file or embedded script or…
You don't. The logic is unchanged if the data changes. A snippet of code would be unchanged, apart from the data.

    // Add an arrow icon
    var arrow_icon = base64decode("00000000000000000...");
    add_image(arrow_icon);
   
That is: the prerequisite for this approach being viable is if one assumes that "code" and "data" are distinct, and that data can be seen as irrelevant placeholders. That is: in the example above I was after the code to add the icon, not the icon payload itself.

There are obvious bordeline cases like large numeric constants that are actually core part of the logic. E.g. a method that multiplies by Pi with 14 digits wouldn't work very well if they were replaced by zeroes. So most likely numerical constants would need to be left alone.

Often times secrets are numerical constants. In your own example, the icon is a base64-encoded number. How would you tell secret numbers apart from the rest?
Base64 isn't numeric it's alphanumeric. The only reason this is reasonable (again) is that alomost all secrets like api keys or complex passwords are maximizing their information content and are therefore alphanumeric (or better). Base64 encoded data does too, and is an innocent casualty in that censorship.
> Base64 isn't numeric it's alphanumeric.

They meant that a number written in hex (base16) is still a number, even though you use some letters. Similarly, a number written in base64 is still a number.

Yes, and in that case I’d like it of Copilot erased/replaced any string over (say) 15 or 20 characters unless they are [0..9]+.

Obviously any string is a potential number in some encoding so the only encoding to exclude would be decimal.

Should any of those be autocompleted?