Hacker News new | ask | show | jobs
by sillysaurus3 3164 days ago
Latch on to that idea -- corners don't look like that -- and follow it to its logical conclusion: no one knows how to make any fully-simulated video look indistinguishable from reality.

That's a huge leap, but it has the benefit of being true.

(To be precise: no one has ever created fully-simulated video capable of fooling human observers anywhere close to 50% of the time. The video has to be reasonably long (>30sec) and complex. But in a double-blind test, almost any nature video will handily beat any synthetic video.)

4 comments

> (To be precise: no one has ever created fully-simulated video capable of fooling human observers anywhere close to 50% of the time. The video has to be reasonably long (>30sec) and complex. But in a double-blind test, almost any nature video will handily beat any synthetic video.)

Out of curiosity is this from something? it sounds like you're citing something and for realtime I'd agree. I think people are fooled everyday by raytraced images (movie cg etc).

> no one knows how to make any fully-simulated video look indistinguishable from reality.

I'd say we know how to do it, we don't know how to do it fast.

I'd say we know how to do it, we don't know how to do it fast.

If all I do here today is hatch an egg of doubt in your head, I would be delighted. Someone needs to carry the torch.. My life has turned to other interests.

It's the central problem. It's as hard as AGI, and it might be as impactful as the invention of the airplane.

Think of it. Fully-simulated video, indistinguishable from reality.

In many ways, it was my first love. The desire to be a gamedev drove me to learn programming. I ended up a graphics dev -- Carmack's old path. My job was to make a certain game engine look better. What an innocent problem... 12 years later, I still feel the pull, the need to blow off everything in my life and bend a computer to my will. To our will. The human mind has never once achieved this goal. And it was the perfect problem... I never cared much about fame or money. But to be the first. Think of it... How can you not want to spend the rest of your life on this? The solution is out there, taunting us. Everyone is pursuing physics, when all we need to do is pursue the fact that video cameras can already generate images that look identical to real life.

The ancients have made up stories about the sky and stars since long before civilization. Put yourself in their shoes, if they even had shoes.

Look up. It's the night sky, far brighter than anything we can see today. Imagine staring up at the infinite complexity, wondering, how does it work? Why do the stars go the way they go? We tell stories; could one of the stories be right?

A few millennia later, one person had an idea: What if we watch the stars very, very carefully, and collect the stories very carefully? We could compare the movements of the stars to the stories, so that the alternative theories might be distinguished from one another.

This was the key to modern science, and the root of wisdom. When you stop thinking about what everyone else is doing, you're free to hit on solutions that everyone else overlooked.

And think of the feeling you'd get when you finally solved it. Can you imagine? You'd get the same rush as the Wright brothers, or Ford when he made the assembly line, or McCarthy when he stumbled across Lisp.

If you or anyone else intends to take on this challenge, know this:

The fact that no one believes you when you say "No one has ever done this, and no one has any idea how to do it," is your biggest advantage.

It means you're free to spend the next five years figuring out that solution that everyone else missed because they were too busy chasing the pipe dream that if you throw enough physics+time at a computer, it will produce synthetic video that fools people into thinking it's real.

The moment people realize that it's probably not that hard, you'll lose your advantage, because every top tech company will start exploring your problem space. Like if in 1902 you'd hinted to a top university the gist of Einstein's thesis. No one would take you seriously. Lucky for you.

So, what's the secret technique? Well, if I knew that, I'd have fulfilled my 12-year dream. But I know a few things that will move you (12-N) years toward the goal.

There is one rule, and one rule alone. You have to force yourself to stay true to it, or else nothing else you do will matter. Here it is:

If you get a dozen people together, and show them a mix of 10 real videos and 10 simulated videos, and those videos are reasonably complex, like clips from a nature documentary, then 12 out of 12 people will effortlessly call out your fake videos as fake and your real videos as real. It's not even close. That's how far away we are from the goal of fully-simulated video indistinguishable from reality.

Maybe I've hooked you at this point. Maybe not. But if anyone comes up with a way to fool those 12 people so completely that their responses are no better than random chance, you win.

Let's call this the "Carmack criterion." If you tried administering the above test to a dozen clones of Carmack, here's how they'd sound: "That's a fake. That one's real. Fake. Fake. Real. Real." No matter how much ornament or showmanship you throw into the video, you can't fool Carmack. He'll report whatever his eyes are telling him.

And as of 2017, he'll be right 100% of the time. His eyes would shout: "None of those fakes were even close to real! Are you kidding? One of the real videos was of a lion taking down a gazelle. I know every artist in the gamedev industry. None of them have ever produced anything approaching that level of quality, even working together."

That, and that alone, is the game. Literally nothing else matters. If you can fool people until their responses are statistically identical to RNG, you've done it. You're world-famous. Yer a wizard.

Corollary: you can use the Carmack criterion like a compass for every decision you face. Should you research physically-based rendering, or try to apply machine learning? The latter seems unpromising. Yet Hollywood has been administering the Carmack test to millions of people, most recently with Avatar, which completely rules out physically-based rendering. So we know to spend zero time on it.

As you can see, that razor is so sharp that it will cut away every illusion you might try to cling to that humans are anywhere close to achieving the Carmack criterion. Or that some smart hacker somewhere has a pretty good idea of how to achieve it, or that it's just a matter of letting computers advance another few decades, or any other false reason that those around you like to tell themselves.

But if mainstream ideas are dead-ends, then what should we research?

I hesitate to give concrete suggestions, because the history of science demonstrates that progress isn't made like that. Whatever the real solution is, it's far beyond anything you or I can imagine today. People were forced by mathematics to believe that planets' orbits were elliptical. An ellipse is the only shape that makes the numbers come out right. Yet how many of our ancestors came up with that idea? Even by accident, it's probably too bizarre for anyone to seriously consider it. Not without mathematics.

Yet that's a positive statement: It meant that if someone were audacious enough to trust in mathematics alone, they could determine the right answer. The solution was always there, waiting for you to find it.

To make any progress at all, your ideas will need to seem shockingly different. The whole world has spent two decades going over every inch of physically-based rendering -- presumably hoping that if they put on a different pair of glasses, maybe they'll spot anything other than a mountain of evidence that it doesn't work.

So you have to let yourself consider every angle, no matter how strange.

720 frames of 720p video. That's all you need. That's 30 seconds of HD footage. Get a computer to conjure up those 720 frames. Summon DaVinci's ghost, and you win.

Whenever someone finally solves this, you'll think "Oh, right. That technique makes sense." But it only makes sense because you see it works. Till then, that correct answer will seem to be a complete waste of time.

Think of Airbnb, and how awful their idea sounded. Yet when someone spent a couple years exploring the problem space, shazam! Out popped a billion-dollar company.

Since there is ~zero chance these ideas are anywhere close to the right answer, here are the two avenues I left off with:

1. A video camera generates images that pass the Carmack criterion. Ask yourself: why do those videos look real? And why is it so important to judge video, not photos? (It's crucial.)

This is key: Are you absolutely certain you should be ignoring the fact that any old camcorder's videos look real? Whip out your phone. Take a video. That video looks real. Why? Quantify the difference vs footage of the latest game engine.

(Try to avoid using the latest movies as a basis of comparison, because movies mix real-life footage into their VFX. Our criteria of "fully-simulated video" is strict by design: it keeps us honest about our progress. Especially to ourselves.)

2. After meditating for a year on why crappy cellphone videos look real, you may start thinking along the lines of "how can I write a program to mimic the essence of that realism?" It looks real because the colors are exactly right. Think of evolution, and how long we've been evolving. That whole time, every single one of our ancestors were staring at images that they believed were real. Our brains are wired to notice even a hint of strangeness. ("When we notice there's something strange about a video, what exactly is going on there? What do we mean by that?" is another "fun" question.)

Now, wouldn't it be handy if someone knew how to write a program that can mimic real-life data? If only such a technique were possible... We even have an infinite stream of pre-classified data to feed it: phones and webcams.

Hmmm. :)

You've probably looked at this, but maybe it would help to avoid thinking "out of the box", and try it more incrementally: create an extremely simple cellphone video of an "easy" scene, like a teapot sitting on linoleum or something. Then try to recreate it -perfectly- using computer graphics. Get it to the level where you can literally compare the pixels for each frame.

Maybe that could bring you closer to understanding what the important factors are that the current graphics pipeline can't do. Why is it hard to get the pixels in the simulated video be the same color as the cell phone ones? Are the materials off? The shadows? If you can't even make the teapot look real, then you've zeroed in on something fundamental that's still going to bite you when you're busy trying to rig antelope skeletons.

There's already one standard scene / benchmark like that, the Cornell Box: https://en.wikipedia.org/wiki/Cornell_box

Of course it's ridiculously simple, so you may want to increase the bar a little to impress GP :p

Thanks for sharing!

If it gives you any sense of hope, I was actually paraphrasing Carmack himself as part of the source of "we already know how". I believe from a Quakecon Keynote (I'm not sure what year unfortunately, the context was PBR and approximations).

The real thing I'm not certain of is do we want to live in a world where we have the ability to create in realtime video content entirely indistinguishable from reality.

The entertainment and simulation benefits would obviously be amazing. The ability to recreate phenomenon we can observe in science but can't see, etc.

But there is also the possibility to weaponize that. Seeing is believing and what do in a future where we can't trust anything we see.

Part of me is glad we don't have to ask ourselves these questions yet.

While not video, check out any IKEA-catalogue of the past couple of years, it's roughly 50/50 photo versus CGI and you can't tell which is which. They got their photo-crew CGI training and vice versa.
> no one knows how to make any fully-simulated video look indistinguishable from reality.

Does that need qualifying with "in a reasonable time frame and/or budget"? Or is it really just that bad?

For what it's worth, I spent about a decade trying to chase down that answer.

If you bet $3,000 that "we have no idea what we're doing" is an accurate assessment, you'd win.

How can that be? Because color science is very difficult. Your eyes are designed to fool you.

When you're born into a certain time period -- a random slice of human history -- the probability the dominant school of thought is mistaken is nearly 1. Wouldn't it be remarkable if we were the first generation who figured out all the truths?

The hardest part is admitting to yourself that it might be true. Could it be possible? Has the world collectively been using techniques that are nowhere close to the final answer?

I launched myself into that question with an open mind. As far as I can tell, the answer is yes.

From having worked in the industry, it's pretty accurate to say that most graphics programmers haven't read any books on color, or the human visual system. I nearly didn't. I was dragged into it because I kept getting strange answers when I tried to mix colors and quantify the diffs -- I was trying to do the same experiment that CIE 1931 did, but I got very different results. That led me to the Musnell color system, and to the history of color theory.

If you glance over the history, you'll notice that our understanding of color keeps changing. The models keep being updated; we can never quite figure out whether they're right. If CIE was perfectly accurate, we'd never have invented LAB space, because CIE would perfectly match nature. Right?

Musnell tried a different approach. Rather than coming up with a fine-sounding theory and curve-fitting it to the data, he built a model directly from the data. One of the most powerful techniques at our disposal is to use our own eyes as a null instrument. You have to have absolute confidence in your own judgement -- I cannot overstate how easy it is to fool yourself -- but if you are as methodical as a robot, you can come up with surprising answers. When those answers contradict the established science that everyone believes, you start to worry. Maybe you weren't careful enough, right? They must know what they're doing; this is what everyone believes, after all.

No... At the end of it, you discover that it really is that strange. Color science is one of the hardest to quantify. There are hard answers, but only when you strip away all the context your eyes relies on. When you look at something, you see literally a million clues that tell your brain it's a 3D shape and that X color is brighter than Y because of Z. Nature has spent a billion years evolving your brain to be able to process all of that instantly. It's impossible to be consciously aware of everything that's happening.

The only way to answer your question is to (a) come up with a methodical test, (b) conduct it meticulously, then (c) trust in yourself and the fact that you are competent and were extremely careful.

If you do all three of those things, you will be dragged kicking and screaming to the conclusion that not one person anywhere in the world has any idea how to generate 100% synthetic video. We don't even know where to begin. No one knows even roughly what the final techniques might look like.

Think about how integral a good artist is. Every rendering pipeline in the world is built for artist flexibility. When a talented team of artists feel empowered by the tools you write for them, they end up producing a different kind of movie altogether. It's not a matter of degree. The reason movies look incredible is because artists mastered the tools we make for them. That's their role, and this is ours. Both halves are crucial.

Yet what does that imply? Imagine we invent a program that produces perfectly real video. People think it looks like a nature documentary. Now think about everything an artist does in a modern pipeline: they decide which shaders to use. Which materials to apply, and to what. The base color of everything. The shape and the animation. They arbitrary select the physics. When grass changes color from green to brown, it's because the atoms they're made of are changing -- everything that makes light bounce off grass in a way that looks real, those are the parameters that artists change "till it looks good." It's arbitrary. It makes no sense to say with a straight face that we've created a "physically based renderer" when the artists have complete authority to break every assumption and piece of data that those physics simulations were modeled from.

The fact that they have so much flexibility is a strong hint that we are very far from mastering this. If artists' jobs were mostly identical to a set designer's job -- placing lights, arranging the scene -- then our renderer must look so real that it may as well be reality, right? If it looked perfectly real, there would be no reason to change it, except as a stylistic choice (which is fine, but it's unrelated to our goals).

Now, people will immediately try to convince you that there are engines out there that work that way. Artists are mostly set designers, they say. But all you have to do is look. Take a clip from whatever movie they produced, and put it side by side with a nature video. Then put it next to the most visually cutting-edge movie you can think of. An honest assessment will show a striking difference.

Lack of flexibility kills the art. The flexibility is the only technique we have. The fact that you can get really talented artists together and give them highly advanced tools, and they end up spitting out stuff that looks real -- it's not inherently obvious that we should've been able to invent those techniques! The fact that it's possible at all is amazing. When graphics programmers believe in the ideology of physically-based rendering, they become slaves to hubris. They start thinking it's reasonable to take away the only tools that work.

Ask yourself this: When an artist is free to flex all the parameters until it looks real, what's going on there? What does that mean, in a fundamental sense?

It's a deep question, and I still haven't come up with a complete answer. But I think it's reasonable to say that artists attempt to make the output on the screen match the output that a video camera would have recorded, if the scene were real. Yes, they make a few stylistic tweaks, but all of it still looks awesome. That's why people pay to experience it. It's partly why Star Wars was such a hit. It was believable.

And that, my friend, is the real question. Asking "Do we know how to make something look real, if only we spent enough money or CPU power on it?" turns out not to make any sense. Counterintuitive, yet true. People care about making movies or manipulating images in photoshop or making games look awesome. They don't care about wasting time trying to coax the computer into generating video that can fool an audience -- they already have a thousand techniques for fooling them! Why generate it when you can mix in actual video from the real world?

As strange is it sounds, I think the full answer is: no one realizes we have no idea how to generate video of complex scenes indistinguishable from reality in a double-blind test because there's no money in it. Not yet. If you happen to invent it, your company might make a million dollars. But you're more likely to lose a million by trying to achieve that objective.

What about scientists? Surely some of them must have spent their lives trying to answer such a deep and fundamental mystery?

Yes and no. There is a lot of impressive work out there, but scientists are mainly concerned about getting published. Their careers are at stake. If you don't publish, you can't get funding, and your impact comes to an end. And the problem is ambiguous: what does it mean to publish a paper related to the idea that people don't know how to generate synthetic images that look real? Everyone already knows that! You can't write a paper on that. The best you can do is try to come up with a paper about an incremental improvement.

And that's exactly what we see. It's all we see. Negative results in science are mostly discarded -- much of the time, we simply don't hear about them. I am speculating, but I think this would be even worse in color science: it's not very prestigious work. When you run an experiment to validate the CIE model and end up with wildly different answers, what do you do? As a scientist with a deadline that may literally kill your career, how likely are you to chase down this mystery? Or to have the freedom to suddenly pivot, and to make the paper about that?

It felt strange to realize no one knows how to make 100% synthetic video look real. It's like sinking up to your neck in quicksand: an inescapable conclusion, and consequences people would rather not dwell on.

Focus on the data. That's the key. Not opinions, not what the professionals believe, but data -- hard answers, obtained from careful experiment with a large sample size -- you arrive at some very unexpected truths.

It's hard to know what to even do with the information. What do you even say? I would've dismissed this at 23. "What are the chances that everyone in the world is being sent to college to learn the wrong techniques? And what about all the published research? The guy who inspired me to become a graphics programmer worked so hard on his graphics engine. He spent years thinking about it. You're saying he has no idea what he's doing, and that all the techniques are fundamentally flawed? That they're not even kinda-sorta close?"

All I can say is, look at the data. Pretend you're piloting a plane in complete darkness. You either trust your instruments -- your carefully-designed experiments -- or you don't.

My main problem with it isn’t that it’s unrealistic per se, just that it looks bad on static geometry. But SSAO is typically sold as a “realism” feature.