Hacker News new | ask | show | jobs
by ryandrake 959 days ago
I always like to see upstream corrective action after something like this. If there was adequate logging / error reporting, this wouldn't have taken a week to fix. Whatever library he sent the invalid "image/jpg" MIME type to should have thrown an exception, crashed, or at the very least, logged loudly. I wonder if OP filed a bug against it.
2 comments

Yeah, at the end of the article I even wondered what kind of environment Shawn (author) works in where it would take so long to diagnose?

Was Shawn able to access anything on the server that would confirm/deny that the image upload was coming through? Why did the image upload work in the test environment but not in the released version of the app? What was different about the test environment?

In theory, Shawn should have had enough access to the server environment (either by running the servers himself or asking someone to help him diagnose why an upload failed silently) that he should have had a reasonably quick answer to "why is this upload succeeding but not showing up?"

IMO, those lessons (why the upload worked in test but not in production) are significantly more important than "the image mime type was set to 'jpg' but should have been 'jpeg'" The bug is much more inconsequential to why the environment made it so hard to find the bug.

In my case, I had a situation where a desktop application was severely malfunctioning, but errors were not being logged. It took me multiple days to realize that the application was running out of file handles, and that log4net wouldn't log if it couldn't get a file handle. Even though the fix (reverting a very small bugfix) was simple; the real fix was to customize log4net to always keep the log file open. This way, if the application ran out of file handles, the error would be logged.

> Was Shawn able to access anything on the server that would confirm/deny that the image upload was coming through? Why did the image upload work in the test environment but not in the released version of the app? What was different about the test environment?

This reminds me of a product I worked on where several (in fact most) of the production-critical APIs (banking APIs, transfer APIs, etc.) had major undocumented differences between production and test instances to the point where if something was using them you just couldn't ever be sure that what you had was actually correct. Some of this stemmed from some of the APIs technically being mandated by law and there was no interest in actually making them good, but some of them were actual B2B solutions that just sucked for no apparent reason.

At points like these it's (IMO) quite defensible to build a very comprehensive adapter that basically does most of the surface work of the API you're using as best you know right now, i.e. almost pretends that it is the other system to a degree where you're re-implementing large parts of it.

Yup, that one paragraph left me scratching my head. In my mind the (very thinly described) image upload functionality should have failed regardless of configuration.

Maybe by "the images simply wouldn't upload." the author meant "did not display", and the file was being uploaded to the data store, was visible when viewed in the data store directly, but would not be displayed in app when requested.

I got the feeling that this is one of those 500-mile email[1] stories, where technical details are omitted for easier storytelling

[1] https://news.ycombinator.com/item?id=9338708

It's not the technical details that's omitted, though. The 500 mile email story told us about the debugging process and limitations imposed by the speed of light.

Shawn correctly points out that it's okay to get stuck for awhile: It happens to everyone. But he never actually has a lesson that he applies to his job.

In the 500 mile email story, the real lesson is has more to do with understanding the risks of upgrades and the need to manage them.

The relevant paragraph in the article bothers me a bit:

"I re-uploaded a version with improved error handling, but image uploads were failing without any feedback. You see, normally code screams its errors at you in red text - silence is the goal. Here silence was the problem."

Silence is not quite the goal. Too many developers think silence is the goal, but the goal is actually accuracy. If there's no error, yes, it should be silent. If there's an error that affects the user, there should be a big red alert box. I believe developers should come to love error messages. Well written error messages reveal causes quickly and save everyone a lot of time.

I hope this developer has learned to show error messages more often. That would be a great outcome.

I would say it should be a big red alert box with a simple error code that the end user can reference when trying to find help.

The applications (CLIs, native, web, etc) I've seen that present me with non-actionable errors is a perpetual source of irritation.

"Failed to open file"

"File could not be uploaded"

etc

Not only are these useless to the user who can't do a thing about them except try the same thing again, they're useless to the developer or support engineer who might be trying to help them.

Related, the "oops, something went wrong" error messages absolutely infuriate me. Something about the tone, like a child spilling milk, and the uselessness of the message get under my skin.