Hacker News new | ask | show | jobs
by amanuonsense 1351 days ago
> The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that.

Stable Diffusion was trained on images of charts and graphs. It knows what a powerpoint presentation and even an excel spreadsheet look like.

Here:

https://imgur.com/a/V4a6W4I

It just doesn't know how to generate a graph like the one it's asked to.

2 comments

It's still stupid. This is like asking DALL-E to generate an image that solves a math equation step by step. Of course this is easier for a human to do.

Try getting a landscape in the style of vincent van gogh for 10$ on fiver though. AI will give you that in seconds easily, and that's what's amazing about it.

I was in a meeting on Cognitive AI at the Royal Society in London last week where a gentleman from Stanford presented work where GPT-3 was prompted to solve math equations step-by-step and did well (better than I would have expected). Point being, if GPT-3 can do it, DALL-E should also be able do it, and testing whether that is the case is not stupid, but interesting.

The big question with systems like those image generation models is to what extent their generation can be controlled, and how much sense it makes. This is exactly the kind of testing that has to be done to answer such questions. Just flooding social media with cherry-picked successes doesn't help answer any questions at all. Because cherry-picking never does.

To be honest, I don't get the defensiveness of the comments in this thread. Half the comments are trying to call foul by invoking some rule they made up on the spot, according to which "that's not how you should use it". The other half pretend they knew all along what the result would be, and yet they're still upset that someone went and tried it, and posted about it. That kind of reaction is not coming from a place of inquisitiveness, or curiosity, that is for sure. It's just some kind of sclerotic reaction to novelty, people throwing their toys because someone went and did something they hadn't thought about.

> Try getting a landscape in the style of vincent van gogh for 10$ on fiver though.

In another comment posted in this thread I tried to get Stable Diffusion to give me a graph with three lines in the style of van Gogh and other famous artists. I'd be very curious to see what that would look like and I can't imagine it easily. I'm left wondering, because Stable Diffusion can't do it. Maybe I should ask someone on fiverr.

I'm not saying there were zero such images, but it obviously wasn't the focus compared to art-type images.
What you said was that they weren't trained on chart images, not that they weren't the focus:

> The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that.

I have no idea how you could even know what was, or wasn't in those models training sets. Yet you posted with conviction as if you were sure you knew. What's the point of that?

Edit - Also, what do you mean "it obviously wasn't the focus"? The focus of what? The focus of training, or the focus of presenting the results on social media?

This is absurdly silly. These data sets contain millions of images at a bare minimum from web crawls, often billions, so of course there will be a non-zero number of charts in them. If you want to be pedantic about it be my guest I guess.

You could probably find a few driver's ed teachers who taught their students to do doughnuts too, but saying "driver's ed teachers don't teach their students to do doughnuts" would nonetheless be largely accurate.

Silly yourself. If there were simply a "non-zero" number of charts in them, the model wouldn't have, you know, modelled them. That the model can reproduce graphs is clear evidence that it saw enough graphs to reproduce them.

And don't call me silly just because you used imprecise language to try to make a vague point with great conviction as if you absolutely knew what you're talking about, when you absolutely didn't. Show some respect to the intellect of your interlocutor, will you?

And, seriously, you haven't answered my question: the focus of what? What do you mean by "it obviously wasn't the focus"?

I think you were emboldened by the downvoting of my comment and assumed you don't need to make sense, but I think the downvoters were downvoting something else than what you refuse to answer.