Hacker News new | ask | show | jobs
by sillysaurus3 3164 days ago
I'd say we know how to do it, we don't know how to do it fast.

If all I do here today is hatch an egg of doubt in your head, I would be delighted. Someone needs to carry the torch.. My life has turned to other interests.

It's the central problem. It's as hard as AGI, and it might be as impactful as the invention of the airplane.

Think of it. Fully-simulated video, indistinguishable from reality.

In many ways, it was my first love. The desire to be a gamedev drove me to learn programming. I ended up a graphics dev -- Carmack's old path. My job was to make a certain game engine look better. What an innocent problem... 12 years later, I still feel the pull, the need to blow off everything in my life and bend a computer to my will. To our will. The human mind has never once achieved this goal. And it was the perfect problem... I never cared much about fame or money. But to be the first. Think of it... How can you not want to spend the rest of your life on this? The solution is out there, taunting us. Everyone is pursuing physics, when all we need to do is pursue the fact that video cameras can already generate images that look identical to real life.

The ancients have made up stories about the sky and stars since long before civilization. Put yourself in their shoes, if they even had shoes.

Look up. It's the night sky, far brighter than anything we can see today. Imagine staring up at the infinite complexity, wondering, how does it work? Why do the stars go the way they go? We tell stories; could one of the stories be right?

A few millennia later, one person had an idea: What if we watch the stars very, very carefully, and collect the stories very carefully? We could compare the movements of the stars to the stories, so that the alternative theories might be distinguished from one another.

This was the key to modern science, and the root of wisdom. When you stop thinking about what everyone else is doing, you're free to hit on solutions that everyone else overlooked.

And think of the feeling you'd get when you finally solved it. Can you imagine? You'd get the same rush as the Wright brothers, or Ford when he made the assembly line, or McCarthy when he stumbled across Lisp.

If you or anyone else intends to take on this challenge, know this:

The fact that no one believes you when you say "No one has ever done this, and no one has any idea how to do it," is your biggest advantage.

It means you're free to spend the next five years figuring out that solution that everyone else missed because they were too busy chasing the pipe dream that if you throw enough physics+time at a computer, it will produce synthetic video that fools people into thinking it's real.

The moment people realize that it's probably not that hard, you'll lose your advantage, because every top tech company will start exploring your problem space. Like if in 1902 you'd hinted to a top university the gist of Einstein's thesis. No one would take you seriously. Lucky for you.

So, what's the secret technique? Well, if I knew that, I'd have fulfilled my 12-year dream. But I know a few things that will move you (12-N) years toward the goal.

There is one rule, and one rule alone. You have to force yourself to stay true to it, or else nothing else you do will matter. Here it is:

If you get a dozen people together, and show them a mix of 10 real videos and 10 simulated videos, and those videos are reasonably complex, like clips from a nature documentary, then 12 out of 12 people will effortlessly call out your fake videos as fake and your real videos as real. It's not even close. That's how far away we are from the goal of fully-simulated video indistinguishable from reality.

Maybe I've hooked you at this point. Maybe not. But if anyone comes up with a way to fool those 12 people so completely that their responses are no better than random chance, you win.

Let's call this the "Carmack criterion." If you tried administering the above test to a dozen clones of Carmack, here's how they'd sound: "That's a fake. That one's real. Fake. Fake. Real. Real." No matter how much ornament or showmanship you throw into the video, you can't fool Carmack. He'll report whatever his eyes are telling him.

And as of 2017, he'll be right 100% of the time. His eyes would shout: "None of those fakes were even close to real! Are you kidding? One of the real videos was of a lion taking down a gazelle. I know every artist in the gamedev industry. None of them have ever produced anything approaching that level of quality, even working together."

That, and that alone, is the game. Literally nothing else matters. If you can fool people until their responses are statistically identical to RNG, you've done it. You're world-famous. Yer a wizard.

Corollary: you can use the Carmack criterion like a compass for every decision you face. Should you research physically-based rendering, or try to apply machine learning? The latter seems unpromising. Yet Hollywood has been administering the Carmack test to millions of people, most recently with Avatar, which completely rules out physically-based rendering. So we know to spend zero time on it.

As you can see, that razor is so sharp that it will cut away every illusion you might try to cling to that humans are anywhere close to achieving the Carmack criterion. Or that some smart hacker somewhere has a pretty good idea of how to achieve it, or that it's just a matter of letting computers advance another few decades, or any other false reason that those around you like to tell themselves.

But if mainstream ideas are dead-ends, then what should we research?

I hesitate to give concrete suggestions, because the history of science demonstrates that progress isn't made like that. Whatever the real solution is, it's far beyond anything you or I can imagine today. People were forced by mathematics to believe that planets' orbits were elliptical. An ellipse is the only shape that makes the numbers come out right. Yet how many of our ancestors came up with that idea? Even by accident, it's probably too bizarre for anyone to seriously consider it. Not without mathematics.

Yet that's a positive statement: It meant that if someone were audacious enough to trust in mathematics alone, they could determine the right answer. The solution was always there, waiting for you to find it.

To make any progress at all, your ideas will need to seem shockingly different. The whole world has spent two decades going over every inch of physically-based rendering -- presumably hoping that if they put on a different pair of glasses, maybe they'll spot anything other than a mountain of evidence that it doesn't work.

So you have to let yourself consider every angle, no matter how strange.

720 frames of 720p video. That's all you need. That's 30 seconds of HD footage. Get a computer to conjure up those 720 frames. Summon DaVinci's ghost, and you win.

Whenever someone finally solves this, you'll think "Oh, right. That technique makes sense." But it only makes sense because you see it works. Till then, that correct answer will seem to be a complete waste of time.

Think of Airbnb, and how awful their idea sounded. Yet when someone spent a couple years exploring the problem space, shazam! Out popped a billion-dollar company.

Since there is ~zero chance these ideas are anywhere close to the right answer, here are the two avenues I left off with:

1. A video camera generates images that pass the Carmack criterion. Ask yourself: why do those videos look real? And why is it so important to judge video, not photos? (It's crucial.)

This is key: Are you absolutely certain you should be ignoring the fact that any old camcorder's videos look real? Whip out your phone. Take a video. That video looks real. Why? Quantify the difference vs footage of the latest game engine.

(Try to avoid using the latest movies as a basis of comparison, because movies mix real-life footage into their VFX. Our criteria of "fully-simulated video" is strict by design: it keeps us honest about our progress. Especially to ourselves.)

2. After meditating for a year on why crappy cellphone videos look real, you may start thinking along the lines of "how can I write a program to mimic the essence of that realism?" It looks real because the colors are exactly right. Think of evolution, and how long we've been evolving. That whole time, every single one of our ancestors were staring at images that they believed were real. Our brains are wired to notice even a hint of strangeness. ("When we notice there's something strange about a video, what exactly is going on there? What do we mean by that?" is another "fun" question.)

Now, wouldn't it be handy if someone knew how to write a program that can mimic real-life data? If only such a technique were possible... We even have an infinite stream of pre-classified data to feed it: phones and webcams.

Hmmm. :)

2 comments

You've probably looked at this, but maybe it would help to avoid thinking "out of the box", and try it more incrementally: create an extremely simple cellphone video of an "easy" scene, like a teapot sitting on linoleum or something. Then try to recreate it -perfectly- using computer graphics. Get it to the level where you can literally compare the pixels for each frame.

Maybe that could bring you closer to understanding what the important factors are that the current graphics pipeline can't do. Why is it hard to get the pixels in the simulated video be the same color as the cell phone ones? Are the materials off? The shadows? If you can't even make the teapot look real, then you've zeroed in on something fundamental that's still going to bite you when you're busy trying to rig antelope skeletons.

There's already one standard scene / benchmark like that, the Cornell Box: https://en.wikipedia.org/wiki/Cornell_box

Of course it's ridiculously simple, so you may want to increase the bar a little to impress GP :p

Thanks for sharing!

If it gives you any sense of hope, I was actually paraphrasing Carmack himself as part of the source of "we already know how". I believe from a Quakecon Keynote (I'm not sure what year unfortunately, the context was PBR and approximations).

The real thing I'm not certain of is do we want to live in a world where we have the ability to create in realtime video content entirely indistinguishable from reality.

The entertainment and simulation benefits would obviously be amazing. The ability to recreate phenomenon we can observe in science but can't see, etc.

But there is also the possibility to weaponize that. Seeing is believing and what do in a future where we can't trust anything we see.

Part of me is glad we don't have to ask ourselves these questions yet.