Hacker News new | ask | show | jobs
by nicklo 3919 days ago
Glad to see a critical eye on these sorts of things. I had a question about the choice of loss function. So regression is definitely the wrong choice since our outputs are binary (0 or 1) and not real-valued (0-1).

We could use a softmax classifier as this is meant for multi-class binary classifications. Each class in this case would represent a neighboring pixel, a classification of that class would represent activating that pixel. However, softmax assumes one-label and the probabilities add up to 1. Our problem is a multi-label multi-class binary classification.

We could train 9 of such softmax node groupings for each neighboring pixel but that immediately seems to be a bad idea.

Another awful solution - make each possible pixel configuration (2^9 of these) a class and do a standard soft-max.

I can start to see why the author chose to use regression loss as a sort of hack to get this to work, but I'm trying to think out the best, proper solution.

Any thoughts?