|
|
|
|
|
by rhdunn
971 days ago
|
|
Often this comes down to how well and descriptive the training data is labelled. If you just label a picture as "man" or "woman" you are not going to get good results compared to something like "face of a caucasian man of Italian descent in his mid 20s with red hair, green eyes, and pale skin, with a blue background". You also need consistently labelled data so that the model can have a chance to learn the differences properly. I've also seen the image models not understand context, so if you ask for e.g. "green eyes" then it will often place the image in grass/a green background, select green clothes, etc. -- i.e. it is only learning the association of the colour and not the association to a particular facial feature. The image models are very bad at feature shifting and not understanding how features combine -- resulting in things like multiple arms because two of the images it is splicing have the arms in different positions. |
|