Hacker News new | ask | show | jobs
by poolio 1358 days ago
Yes, this is often a problem. We use view-dependent prompts (e.g. "cat wearing sunglasses, back view") but the pretrained 2D model often does not do a good job of interpreting non-canonical views and will put sunglasses on the back of the cats head (as well as the front).
1 comments

>cat wearing sunglasses, back view")

Bad prompt, missing implied antecedent/ambiguous subject...

You may want:

Back view of a cat which is wearing sunglasses, back view of a cat, but the view is wearing sunglasses, etc... I actually tried using projective terms from drafting books, and didn't get great results. Nor anatomicals either.

>Back view of a cat which is wearing sunglasses, back view of a cat, but the view is wearing sunglasses, etc... I actually tried using projective terms from drafting books, and didn't get great results. Nor anatomicals either.

In short: natural language is not good enough and you need a DSL. If only the last 60 years of language research had warned us of this.

Up next: English sentences are ambiguous and need context information to parse correctly. Machine learning community in shambles.

I mean... It kinda did. Sarcasm aside.

The trouble is all those darn uninitiated and trying to create a generalized oracle to map their inspecific ramblings to what they mean to free them of having to actually communicate properly...

Actually, funnily enough, this has cross section with philosophy in a way most programmers scoff at; but communication is frigging hard, and worse, detecting when someone is trying to get something across, but just needs a nudge in the right direction to be able to find the language to explain it is really damn hard.

I run into it every time I get a haircut. I have no idea how to speak their language, so it's always "Uh... A little off the top and rounded at the back, I guess?"

yep our fixed strategy for view-dependent prompting is silly and there is tons of room for improvement!