| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MrMeritology 3744 days ago

I just wrote a follow-up post giving more evidence: http://exploringpossibilityspace.blogspot.com/2016/03/micros...

I call it "poor software QA" because, generally, the software QA process is supposed to detect and prevent defects from 1) being introduced in the first place; and 2) from being propagated into "production" versions. As my most recent post shows, the "repeat after me" rule was a legacy of a software library (ALICE) they used to implement rule-based behavior. Some sort of QA process should have been done on the rule set they reused and modified.

Also, when I say "QA" I am not referring only to people with QA in their job title. I'm referring to the process.

1 comments

jdp23 3744 days ago

Thanks for the reply, and interesting followup with ALICE.

If you're using "software QA failure" in the very general sense of "a defect got introduced and then propagated to production", then yes that's what happened here. But then you're essentially saying "the root cause of this defect is that a defect got introduced and then propagated". This isn't useful for process improvement (it's true for every defect, so doesn't give any insight into what happened).

If you're right about them reusing ALICE, then a more useful way of looking at the root cause of the repeat-after-me bug is "component reuse without considering the attack model". That highlights other situations where there are risks of similar defects, and points to ways to prevent or detect similar defects.

Since there were other bugs as well, the requirement and/or design issues might still be a better candidate for root cause for the whole Tay-fail. One of the things you discover doing root cause analysis is that there are almost always multiple contributors, and you typically want to make process changes at multiple levels.

link

MrMeritology 3743 days ago

I'm not using "software QA" in the very broad way you describe in the first sentence. Also, "defect" as I use it does not mean every short-coming of the product. It means "something doesn't work (or failed) as designed or required."

Context: my blog posts are meant to contrast with "experts" who claimed that poisoning social AI was just in the nature of learning systems, even when they worked as designed (i.e. had no defects). They are claiming that Tay learned to be foul mouthed and racist, and thus had become foul mouthed and racist.

If that were true, then this undesirable behavior would not be a software QA problem. The AI would be working as designed. No QA process would change things.

In contrast, I'm claiming (from evidence) that the main problem in Tay is due to a hidden feature in a reused library + rule set that should have been detected and removed in a QA process that considered various attacks. BTW, this attack (getting bot to repeat naughty words) has been around since ELIZA in the 60s.

The other failings of Tay (esp. no black list) are design and requirement failures, not QA failures.

link

jdp23 3743 days ago

> I'm not using "software QA" in the very broad way you describe in the first sentence.

I took this description from your earlier comment "I call it "poor software QA" because 1) and 2)" so yes, you are using it that way at least sometimes :)

Anyhow we obviously see things differently on the root cause side (and both of us are on the outside so there's a lot we don't know). That said I certainly agree that it's a defect, and that it's an attack that could reasonably have been anticipated.

link