Hacker News new | ask | show | jobs
by _chendo_ 3973 days ago
Going to list a few interesting problems that were easy to solve but the identification of the issue was hard.

* A test suite we wrote for a client's project before a massive refactor was stalling randomly, but would continue when you tried to diagnose the problem. Turns out their user creation code used /dev/random, and the system was running out of entropy and so the code was blocking. Moving the mouse or typing on the keyboard would add entropy, thus cause the tests to resume. Fix was to to use /dev/urandom for tests.

* Found a weird issue with an embedded network stack where a limited broadcast packets to more than 3 devices would result in only a response from a few of them, but directed packets to each device would work fine. Devices reported successfully receiving and transmitting when monitored over serial console. Issue turned out to be a bug in the ARP implementation where it would incorrectly store any ARP response it saw (rather than ARP responses the device requested). Given the embedded system has a limited ARP cache due to memory constraints, when multiple devices wanted to respond, they would all send ARP requests, and the responses would flush the ARP cache, so when the network stack wanted to send the response, it didn't know what MAC to use and just drop it on the floor. A workaround was to increase the ARP cache size.

1 comments

> Turns out their user creation code used /dev/random, and the system was running out of entropy and so the code was blocking. Moving the mouse or typing on the keyboard would add entropy, thus cause the tests to resume.

Funny how this goes completely against the typical operant conditioning a user undergoes when working with computers. Usually if your software hangs up, you want to touch nothing and let it finish. But in this case it's actually additional user activity that's needed.

It seems perfectly in line with what I generally see. A hang up usually results in some desperately mashed combination of Esc, Space, and Enter, then clicking on absolutely everything, and finally mashing ctrl-alt-del in the hopes of something happening. The let it do its thing and wait crowd has always been on the higher end of the technical knowledge spectrum.