The network learns something about the relationship between characters, and by extension phonemes, in the training set of nice sounding words. Obviously it could not learn the image the word invokes, given the size of the training set and since there are no negative examples.
"turdurine" sounds nice to me; "amamanus" does not