Hacker News new | ask | show | jobs
by shakna 3247 days ago
The data is also public, at:

https://dotnetcli.blob.core.windows.net/usagedata/dotnet-cli...

So, the latest would be:

https://dotnetcli.blob.core.windows.net/usagedata/dotnet-cli...

1 comments

I think it's noteworthy that they even include command line arguments that are mistyped, for example "bulid".

What happens if you accidentally paste an AWS secret key or similar in the middle of a command line argument? Will that too appear in public csv files a year later?

Hi. Team member here. We used a simple algorithm to prevent that. We essentially got the data itself to vote on what a real command was for exactly this reason. This means that a lot of people typed "bulid" since the vote passed on that one. I don't have a count, but many rows were not included in the data since they didn't pass the minimum threshold for being a real command. Imagine you spelled "build" backwards for some reason. That would have been quite uncommon.
This is one of my objections. There are also no test cases for this piece of code either.
They don't include command line arguments (yet). They include the command verb (dotnet [build/restore/etc]) that was ran.
Well, it looks like they are including "command verbs" even if they are mistyped, for example "bulid".

What happens if you accidentally paste an AWS secret key or similar in the middle of a command verb? Will that too appear in public csv files a year later?

See my comment to the grandparent comment on our approach to only including common command strings (which wouldn't include anyone's AWS key). Also, and more importantly, we will only collect known arguments. From the blog post:

> Only known arguments and options will be collected (not arbitrary strings).

We don't want your AWS secret key in this data as much as you do. We have put systematic mitigations in place to ensure that this doesn't happen.