Hacker News new | ask | show | jobs
by gwbas1c 961 days ago
Yeah, at the end of the article I even wondered what kind of environment Shawn (author) works in where it would take so long to diagnose?

Was Shawn able to access anything on the server that would confirm/deny that the image upload was coming through? Why did the image upload work in the test environment but not in the released version of the app? What was different about the test environment?

In theory, Shawn should have had enough access to the server environment (either by running the servers himself or asking someone to help him diagnose why an upload failed silently) that he should have had a reasonably quick answer to "why is this upload succeeding but not showing up?"

IMO, those lessons (why the upload worked in test but not in production) are significantly more important than "the image mime type was set to 'jpg' but should have been 'jpeg'" The bug is much more inconsequential to why the environment made it so hard to find the bug.

In my case, I had a situation where a desktop application was severely malfunctioning, but errors were not being logged. It took me multiple days to realize that the application was running out of file handles, and that log4net wouldn't log if it couldn't get a file handle. Even though the fix (reverting a very small bugfix) was simple; the real fix was to customize log4net to always keep the log file open. This way, if the application ran out of file handles, the error would be logged.

2 comments

> Was Shawn able to access anything on the server that would confirm/deny that the image upload was coming through? Why did the image upload work in the test environment but not in the released version of the app? What was different about the test environment?

This reminds me of a product I worked on where several (in fact most) of the production-critical APIs (banking APIs, transfer APIs, etc.) had major undocumented differences between production and test instances to the point where if something was using them you just couldn't ever be sure that what you had was actually correct. Some of this stemmed from some of the APIs technically being mandated by law and there was no interest in actually making them good, but some of them were actual B2B solutions that just sucked for no apparent reason.

At points like these it's (IMO) quite defensible to build a very comprehensive adapter that basically does most of the surface work of the API you're using as best you know right now, i.e. almost pretends that it is the other system to a degree where you're re-implementing large parts of it.

Yup, that one paragraph left me scratching my head. In my mind the (very thinly described) image upload functionality should have failed regardless of configuration.

Maybe by "the images simply wouldn't upload." the author meant "did not display", and the file was being uploaded to the data store, was visible when viewed in the data store directly, but would not be displayed in app when requested.

I got the feeling that this is one of those 500-mile email[1] stories, where technical details are omitted for easier storytelling

[1] https://news.ycombinator.com/item?id=9338708

It's not the technical details that's omitted, though. The 500 mile email story told us about the debugging process and limitations imposed by the speed of light.

Shawn correctly points out that it's okay to get stuck for awhile: It happens to everyone. But he never actually has a lesson that he applies to his job.

In the 500 mile email story, the real lesson is has more to do with understanding the risks of upgrades and the need to manage them.