| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BeefWellington 1589 days ago

A few years ago I did some work with IBM's Watson Twitter integration. One of the fun things you could do was sentiment analysis. It was reasonably accurate for the extremes but anything in the gray area would be wildly off. A politely worded tweet that was scathing would come across high on the positive sides of the scale, whereas a perfectly reasonable sentence that included profanity as used in a quote would immediately be high on the negatives.

This part from the article made me chuckle, because IMO the author fell for some of the most basic language processing smoke & mirrors:

    …so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.

Here, it generates toxic_comments as a variable name, when the instructions were:

   # create an array with the following toxic comments: [etc]

This is pretty basic language parsing stuff that might have been kicking around awhile. I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like. While impressive, it's not nearly as interesting or good as the rest of the work being done.

Copilot appears no different to most ML models out there. Poor and incomplete training data will yield ok results for popular things but as soon as you ask for edge cases it will fall apart like Siri trying to understand a Scottish accent.

Eventually it might get there with enough good representative training data but it's unclear to me how long that will take. If it tracks with speech processing models it might take decades plus.

Another consideration is that because the training data is being done using github public repos (at least last I read), it's likely that it's ripe for abuse. If that's still how they're doing it I'm looking forward to the TEDTalk in two years from a researcher who "hacked" the copilot AI by polluting its training data.

3 comments

visarga 1589 days ago

> I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like.

OK, I am waiting for you to propose a basic language parser that can do it. There's a reason we're only now having this debate - it was unconceivable 5 years ago, in the era of basic language parsers.

link

BeefWellington 1589 days ago

> OK, I am waiting for you to propose a basic language parser that can do it. There's a reason we're only now having this debate - it was unconceivable 5 years ago, in the era of basic language parsers.

This is really untrue. In fact, making "English as a programming language" was a goal of many older programming languages such as COBOL[1], BASIC, and PASCAL as early as the 60s. It's hardly a new idea and was hardly inconceivable "5 years ago" for something to output a programming language.

The sentence example here could easily be broken down by the ParseTalk model from the mid-90s[2].

Here's a recent ish example (2018) of someone developing a "fully English" programming language:

https://osmosianplainenglishprogramming.blog/2018/05/02/plai...

It's also a source of fun[3][4][5] for people.

These are all examples of either programming languages straight up using English as syntax, or lexical parsers that can break down language and provide you with the programmatic ability to make this kind of output.

The difference here is that while copilot is pulling in python examples based on its training data set, that one thing the author singled out for amazement could easily be done by these older non-ML methods. The value copilot is adding in the example is just outputting python compared to those other methods. The real value is way larger than that, pulling in potentially more complex code to accomplish a complete task.

It's a bit like seeing an all-electric cargo train and being amazed that a train can run on electricity, when electrified light rail has existed for a long time. The impressive part is not that a thing on rails can use electricity to move around, it's the fact that it can pull heavy cargo efficiently enough to make electric power viable.

[1]: https://en.wikipedia.org/wiki/COBOL#COBOL_60

[2]: https://arxiv.org/abs/cmp-lg/9410017

[3]: https://github.com/RockstarLang/rockstar/blob/main/examples/...

[4]: https://en.wikipedia.org/wiki/Shakespeare_Programming_Langua...

[5]: https://github.com/lhartikk/ArnoldC/wiki/ArnoldC

link

sdoering 1589 days ago

Years ago at pyData Berlin I remember a talk trying to classify comments from three major online newspapers with the question if we. Could detect where a comment was made.

One newspaper was left leaning, the other had the reputation of right wing trolls commenting and one was somewhat in the middle ground with a reputation of the audience being pseudo intellectual neoliberalists.

The 'center' (most typical) comment for these three sites totally was in line with these sentiments. The perfect proof (or confirmation bias).

But the classification didn't work. While there were clear cut cases (one has to love stereotypes) most cases were just neutral. Meaning they could have been made on any of these media sites. Either they were just too short or just not extreme enough.

I feel (used explicitly here) that toxicity is not something that is easily classifiable without deeper understanding of the context. Else, if feeling a comment was toxic was the measure one would need to query all walks of life from extreme left to extreme right and afterwards would probably be left with a lot of toxicity that doesn't tell us much except that different people will find different things toxic.

link

jayceekay 1589 days ago

didn't watson turn out to be useless and spaghetti code inside? aka ibm's marketing arm

link

BeefWellington 1589 days ago

I'll preface this by saying that my time working with it was while I was working at IBM, so feel free to take this with a grain of salt. In my time since I've worked in a few Data/ML and Security positions, so I do have a basis for comparison with other systems.

From what I saw, the actual language-processing part of it was top-tier. It's just it's a hard problem to come up with a demo for that people will actually respond positively to, hence the Jeopardy stint. It has limited real applications. It's really good at what it does but what it does isn't really widely useful.

Nobody wants to see "We're going to replace all our online help / support chat stuff with Watson" because people find those systems frustrating already, even if it would make things vastly better than some of the alternatives.

So you end up with weird stuff like Chef Watson, Doctor Watson, and so on -- things in areas where an ML model isn't going to replace a human anytime soon.

Then Marketing gets involved and suddenly anything that uses any kind of ML needs to have Watson slapped on it, even if it's not doing any language processing.

link

esjeon 1589 days ago

Welp, you're downplaying IBM too much. IBM got the product direction right earlier than anyone. Watson is a querying system w/ advanced NLP/IR/KRR capability running on dedicated compute chips, and large corps are more or less following this path. It's just that IBM did it too early and used rather old approaches, which doesn't grow well (thus "spaghetti").

Still, Watson is pretty much the only one in its class. There are good alternatives out there that worked well for many people, but they offer only a subset of Watson's feature set. If an organization need some real bang, Watson is the only option.

link