|
|
|
|
|
by alex7734
1121 days ago
|
|
Yes and no, it took me from 8am to 3am once we decided it needed to get fixed but really it sat on the app for years, it only happened on a background process that sent print jobs on a timer, since it used Windows GDI to compose the image we sent to the printer it was affected (our "frontend" should've been affected too but never was, I guess because it had a different memory usage pattern). We just had it restart itself and try again whenever it got one of those errors when printing but eventually we wanted to add a feature that required the process to not die, and by that time I was already 99% sure that it wasn't something in our code and I had already ruled out threading issues. I ended up putting it in a VM with a kernel debugger attached and having a script make a snapshot and make it print over and over until it errored, then following along in IDA until I saw what was going on. Having a way to trigger it (by restoring the snapshot) on demand helped a lot, otherwise it would have taken forever to make sense of it as it could sit without crashing for nearly an hour. |
|
While I was at MS, it was such a big PITA - we just had a bunch of IT managed machines with KVM console access and KDNET for debugging.