|
|
|
|
|
by linuxrebe1
672 days ago
|
|
Based on the way you were troubleshooting it. You can tell you're a programmer first. You went to your code, you went to your logs. Both reasonable, both potential causes of the problem. Both ignore the primary clue that you had. It worked on localhost. As an SRE/devops/platform engineer or whatever the title of the day is people want to give. I would have zeroed in on the difference between the working system. And the non-working system. Either adding and then removing, or removing and then adding back the differences one at a time. Until something worked. What I see is two things.
1) you have an environment where it does work.
2) the failing environment was working, then started failing. Is my method superior to yours, no. It just is being stated to highlight the difference in the way we look at a problem. Both of a zero in on what we know. I know systems, you know code. |
|
plug a good board on an extender. run a diagnostic that fails in a loop. using a scope, look at every pin on the connector. write down what you see. replace with a bad board. repeat.
which signals are different? chase them back. if the schematic does not match, get out a voltmeter and your eyes and draw a schematic that reflects how the board is wired.
he called this "good card - bad card".
and it worked. not going to make any claims about cost effectiveness, but we fixed every board. and my troubleshooting skills in digital electronics were greatly improved.
this was a 'fireman' kind of job. waiting for the system to break, so it didn't matter if 2 techs put a week into 1 circuit board.