|
|
|
|
|
by _chendo_
3973 days ago
|
|
Going to list a few interesting problems that were easy to solve but the identification of the issue was hard. * A test suite we wrote for a client's project before a massive refactor was stalling randomly, but would continue when you tried to diagnose the problem. Turns out their user creation code used /dev/random, and the system was running out of entropy and so the code was blocking. Moving the mouse or typing on the keyboard would add entropy, thus cause the tests to resume. Fix was to to use /dev/urandom for tests. * Found a weird issue with an embedded network stack where a limited broadcast packets to more than 3 devices would result in only a response from a few of them, but directed packets to each device would work fine. Devices reported successfully receiving and transmitting when monitored over serial console. Issue turned out to be a bug in the ARP implementation where it would incorrectly store any ARP response it saw (rather than ARP responses the device requested). Given the embedded system has a limited ARP cache due to memory constraints, when multiple devices wanted to respond, they would all send ARP requests, and the responses would flush the ARP cache, so when the network stack wanted to send the response, it didn't know what MAC to use and just drop it on the floor. A workaround was to increase the ARP cache size. |
|
Funny how this goes completely against the typical operant conditioning a user undergoes when working with computers. Usually if your software hangs up, you want to touch nothing and let it finish. But in this case it's actually additional user activity that's needed.