There are layers upon layers of nonlinearity, be it with softmax or sigmoid. In the tangent kernel view it does linearize.