| As an objective criteria what percentage include peddles and a chain connecting one of the wheels? I quickly found a dozen and stopped counting. Now do the same for those LLM images and it’s clear humans win. > ""Average human" is a much lower bar than most people want to believe I have some basis for comparison. I’ve seen 6 years olds draw better bikes than those LLM’s. Look through that list again the worst example does even have wheels, multiple of them have wheels without being connected to anything. Now if you’re arguing the average human is worse than the average 6 year old I’m going to disagree here. > Given mandatory art lessons in school are longer than 10 months, and yet those bike examples exist, I have no reason to believe this. Art lessons don’t cumulatively spend 10 months teaching people how to draw a bike. I don’t think I cumulatively spent 6 months drawing anything. Painting, collage, sculpture, coloring, etc art covers a lot and wasn’t an every day or even every year thing. My mandatory collage class was art history, we didn’t create any art. You may have spent more time in class studying drawing, but that’s not some universal average. > If you automate it in literally the manner in this write-up (pairwise comparison via API calls to another model to get ELO ratings), ten thousand images is like $60-$90, which is on the low end for a human commission. Not every one of those images had a price tag but one was 88 cents, * 10,000 = 8,800$ just to make the image for a test even at 4c/image your looking at 400$. Cheaper models existed but fairly consistently had worse performance. |