|
|
|
|
|
by BeefWellington
1542 days ago
|
|
A few years ago I did some work with IBM's Watson Twitter integration. One of the fun things you could do was sentiment analysis. It was reasonably accurate for the extremes but anything in the gray area would be wildly off. A politely worded tweet that was scathing would come across high on the positive sides of the scale, whereas a perfectly reasonable sentence that included profanity as used in a quote would immediately be high on the negatives. This part from the article made me chuckle, because IMO the author fell for some of the most basic language processing smoke & mirrors: …so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.
Here, it generates toxic_comments as a variable name, when the instructions were: # create an array with the following toxic comments: [etc]
This is pretty basic language parsing stuff that might have been kicking around awhile. I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like. While impressive, it's not nearly as interesting or good as the rest of the work being done.Copilot appears no different to most ML models out there. Poor and incomplete training data will yield ok results for popular things but as soon as you ask for edge cases it will fall apart like Siri trying to understand a Scottish accent. Eventually it might get there with enough good representative training data but it's unclear to me how long that will take. If it tracks with speech processing models it might take decades plus. Another consideration is that because the training data is being done using github public repos (at least last I read), it's likely that it's ripe for abuse. If that's still how they're doing it I'm looking forward to the TEDTalk in two years from a researcher who "hacked" the copilot AI by polluting its training data. |
|
OK, I am waiting for you to propose a basic language parser that can do it. There's a reason we're only now having this debate - it was unconceivable 5 years ago, in the era of basic language parsers.