Hacker News new | ask | show | jobs
by Jaecen 3419 days ago
This title seems incorrect. The article doesn't specify any other vendors or products that have been directly affected by this issue.
3 comments

Near the bottom, the article currently states:

"Other vendors using Atom C2000 chips include Aaeon, HP, Infortrend, Lanner, NEC, Newisys, Netgate, Quanta, Supermicro, and ZNYX Networks. The chipset is aimed at networking devices, storage systems, and microserver workloads."

I'm guessing that may represent what OP meant (?)

It says other vendors are using the chip, but there's no data on failures of other devices. We don't know what causes the chip to fail, but it's possible that Cisco's application may be uniquely, or at least uncommonly, susceptible.
Lots of reports from people using other boards with the C2000s having failures after a few months. The Asrock board is common in NAS's because of the 12 SATA ports. Most of the failure reports are similar.

https://www.amazon.com/ASRock-Rack-Mini-Motherboards-C2550D4...

https://www.google.com/search?q=ASRock+C2550D4I+failure+rate...

The title is technically correct, just annoyingly written. As someone who's build a PFSense box using a supermicro board with one of the affected chips, I'm definitely sad that I'll have to rip it apart to replace the parts.
I have the same problem: I'm using various C2000-based Supermicro boxes running pfSense. The most cost-effective DIY, rack mountable solution for a pfSense box was until now SYS-5018A-FTN4. Do you know if Supermicro issued a technical bulletin about this box?
Last Friday, my OpenBSD firewall, which runs on a SYS-5018A-FTN4, mysteriously crashed. I chalked it up to an alpha particle or something and rebooted. About 12 hours later, it failed again. This time I did some more digging. On the console was the following message:

  NMI ... going to debugger
  Stopped at    acpicpu_idle+0x22d:     nop
  ddb{0}>
I googled it and found one similar report on the OpenBSD misc mailing list from September 2016 [1]. Interestingly, the person who reported the bug was running the same Supermicro board as I was. The report didn't get anywhere other than a vague suggestion that it might be heat related. These boxes run very cool and I didn't think that was likely. I thought it might be a RAM issue and that it was probably just a coincidence that the other person had the same hardware as I, but now I'm inclined to think that both of us have experienced the issue described in TFA.

Seems like I'll be looking for new firewall hardware.

[1] https://www.mail-archive.com/misc@openbsd.org/msg149348.html

If you were able to reboot the box then you did not hit this issue. When you hit this issue your chip is dead.
This may be completely unrelated though.
Ah crap. I guess the reseller selling me old-new stock of an Avoton system http://www.supermicro.com/products/chassis/tower/721/SC721TQ... isn't really going to care. Shipping the product back would be ~150+AUD. Can't buy this one in Australia unfortuately.
Yeah, these avoton-based boards seemed popular in the freeNAS / diy home server community for being cheap and low power while supporting ecc ram. Even the official freeNAS mini server used (and still used when I checked last year) a supermicro board with an avoton CPU.
Websites seem to suggest it's running the C2750, so it does appear to be affected.
Confirmed. I have a FreeNAS mini, and it has a C2750 in it.
The Avoton? Shame, really, it seems like a great board otherwise.
It's part of the errata for the chip. Go to:

http://www.intel.com/content/dam/www/public/us/en/documents/...

and search for AVR54

I understand that the chip has a flaw. The title claims non-Cisco products are being bricked. What other products have actually been impacted by this issue? The article doesn't give any data, just a list of vendors using the chip. Is there any proof other devices are impacted by this issue?

I'm not claiming that the chip isn't failing; I'm disappointed that the title makes a claim that the article doesn't deliver on.

Check the synology forums linked in the article.

Quite a few units have completely died without explanation. From the descriptions given by the users it does sound like dead cpu.