Perhaps one way to look at this could be auto-scaffolding. The typical modelling and CAD tools might include this feature to get you up and running faster.
Another massive benefit is composability. If the model can generate a cup and a table, it also knows how to generate a cup on a table.
Think of all the complex gears and machine parts this could generate in the blink of an eye, while being relevant to the project - rotated and positioned exact where you want it. Very similar to how GitHub Copilot works.
I don't see that LLM's have come that much further in 3D animation than programming in this regard: It can spit out bits and pieces that looks okay in isolation but a human need to solve the puzzle. And often solving the puzzle means rewriting/redoing most of the pieces.
We're safe for now but we should learn how to leverage the new tech.
Probably, but isn't that how most if the technical fields go? Software in particular moves blazing fast and you need to adapt to the market quickly to be marketable.
Some are safe for several years (3-5), that's it. During that time it's going to wreck the bottom tiers of employees and progressively move up the ladder.
GPT and the equivalent will be extraordinary at programming five years out. It will end up being a trivially easy task for AI in hindsight (15-20 years out), not a difficult task.
Have you seen how far things like MidJourney, Dalle, Stable Diffusion have come in just a year or two? It's moving extremely fast. They've gone from generating stick figures to realistic photographs in two years.
The reason AI generative tools are faster to become useful in artistic areas is that in the arts you can take “errors” as style.
Doesn’t apply too much to mesh generation but was certainly the case in image gen. Mistakes that wouldn’t fly for a human artist (hands) were just accepted as part of AIgen.
So these areas are much less strict about precision than coding. Making these tools much more capable are replacing artists in some tasks than CoPilot is for coders atm.
So you're probably familiar with the role of a Bidding Producer; imagine the difficulty they are facing: on one side they have filmmakers saying they just read so and so is now created by AI, while that is news to the bidding producer and their VFX/animation studio clients scrambling as everything they do is new again.
I don't know, 3D CGI has already been moving at the breakneck speed for the last three decades without any AI. Today's tools are qualitatively different (sculpting, simulation, auto-rigging etc etc etc).
3D CGI has gotten faster, but I haven’t seen any qualitative jump for quite some time.
IMO the last time a major tech advance was visible was Davy Jones on the Pirates films. That was a fully photorealistic animated character that was plausible as a hero character in a major feature. That was a breakthrough. After that a lot of refinement and speeding up.
This is different. I have some positivity about it, but it’s getting hard to keep track of everything that’s going on tbh. Every week it’s a new application and every few months it’s some quantum leap.
Like others said, Midjourney and DallE are essentially photorealistic.
It seems to me that the next step is generative AI creating better and better assets.
And then of course you have video generation which is happening as well…
Both DE3 and MJ are essentially toys for single random pictures, unusable in a professional setting. DALL-E in particular has really bad issues with quality, and while it follows the prompt well it also rewrites it so it's barely controllable. Midjourney is RLHF'd to death.
What you want for asset creation is not photorealism, but style and concept transfer, multimodal controllability (text alone is terrible at expressing artistic intent), and tooling. And tooling isn't something that is developed quickly (although there were several rapid breakthroughs in the past, for example ZBrush).
Most of the fancy demos you hear about sound good on paper, but don't really go anywhere. Academia is throwing shit at the wall to see what sticks, this is its purpose, especially when practice is running ahead of theory. It's similar to building airplanes before figuring out aerodynamics (which happened long ago): watching a heavier-than-air thing fly is amazing, until you realize it's not very practical in the current form, or might even kill its brave inventor who tried to fly it.
If you look at the field closely, most of the progress in visual generative tooling happens in the open source community; people are trying to figure out what works in real use and what doesn't. Little is being done in big houses, at least publicly and for now, as they're more interested in a DC-3 than a Caproni Ca.60. The change is really incremental and gradual, similarly to the current mature state of 3D. Paradigms are different but they are both highly technical and depend on academic progress. Once it matures, it's going to become another skill-demanding field.
With respect, I disagree with almost everything you said.
The idea that somehow “AI isn’t art directable” is one I keep hearing, but I remain unconvinced this is somehow an unsolvable problem.
The idea that AIgen is unusable at the moment for professional work doesn’t hold up to my experience since I now regularly use Photoshop’s gen feature.
Photoshop combined with Firefly is exactly the rare kind of good tooling I'm talking about. In/outpainting was found to be working for creatives in practice, and got added to Photoshop.
>The idea that somehow “AI isn’t art directable” is one I keep hearing, but I remain unconvinced this is somehow an unsolvable problem.
That's not my point. AI can be perfectly directable and usable, just not in the specific form DE3/MJ do it. Text prompts alone don't have enough semantic capacity to guide it for useful purposes, and the tools they have (img2img, basic in/outpainting) aren't enough for production.
In contrast, Stable Diffusion has a myriad of non-textual tools around it right now - style/concept/object transfer of all sorts, live painting, skeleton-based character posing, neural rendering, conceptual sliders that can be created at will, lighting control, video rotoscoping, etc. And plugins for existing digital painting and 3D software leveraging all this witchcraft.
All this is extremely experimental and janky right now. It will be figured out in the upcoming years, though. (if only community's brains weren't deep fried by porn...) This is exactly the sort of tooling the industry needs to get shit done.
Ah ok yes I agree. How many years is really the million dollar question. I’ve begun to act as if it’s around 5 years and sometimes I think I’m being too conservative.
You can remain unconvinced but it's somewhat true.
I can keep writing prompts for DE3 or similar until it gives me something like what I want, but the problem is, there are often subtle but important mistakes in many images that are generated.
I think it's really good at portraits of people, but for anything requiring complex lighting, representation of real world situations or events, I don't think it's ready yet, unless we're ready to just write prompts, click buttons and just accept what we receive in return.
Midjourney already has tools that allow you to select parts of the image to regenerate with new prompts, Photoshop-style. The tools are being built, even if a bit slowly, to make these things useful.
I could totally see creating Matte paintings through Midjourney for indie filmmaking soon, and for tiny budget films using a video generative tool to make let’s say zombies in the distance seems within reach now or very soon. Slowly for some kind of VFX I think AI will start being able to replace the human element.
I'm not a professional in VFX, but I work in television and do a lot of VFX/3D work on the side. The quality isn't amazing, but it looks like this could be the start of a Midjourney-tier VFX/3D LLM, which would be awesome. For me, this would help bridge the gap between having to use/find premade assets and building what I want.
For context, building from scratch in a 3D pipeline requires you to wear a lot of different hats (modeling, materials, lighting, framing, animating, ect). It costs a lot of time to get to not only learn these hats but also use them together. The individual complexity of those skill sets makes it difficult to experiment and play around, which is how people learn with software.
The shortcut is using premade assets or addons. For instance, being able to use the Source game assets in Source Filmmaker combined with SFM using a familiar game engine makes it easy to build an intuition with the workflow. This makes Source Filmmaker accessible and its why theres so much content out there made with it. So if you have gaps in your skillset or need to save time, you'll buy/use premade assets. This comes at a cost of control, but that's always been the tradeoff between building what you want and building with what you have.
Just like GPT and DALL-E built a bridge between building what you want and building with what you have, a high fidelity GPT for the 3D pipeline would make that world so much more accessible and would bring the kind of attention NLE video editing got in the post-Youtube world. If I could describe in text and/or generate an image of a scene I want and have a GPT create the objects, model them, generate textures, and place them in the scene, I could suddenly just open blender, describe a scene, and just experimenting with shooting in it, as if I was playing in a sandbox FPS game.
I'm not sure if MeshGPT is the ChatGPT of the 3D pipeline, but I do think this is kind of content generation is the conduit for the DALL-E of video that so many people are terrified and/or excited for.
I think producer roles are a little bit less ultra competitive / scarce as they are actually jobs jobs where you have to use excel and planning and budgeting.
Being a producer means being on the phone all the time, negotiating, haggling, finding solutions where they don’t seem to exist.
Be it in TV, advertising or somewhere in the media space, the common rule is that producers are mostly actually terrible at their jobs, that’s my experience in London. So if she’s really good and really dedicated and learns the job of everyone on set, I’d say she has a shot.
The real secret to being good in filmmaking is learning everyone else’s job. Toyota Production System says if you want to run a production line you have to know how it works.
If she wants to do VFX production she could start doing her own test scenes, learning basics in nuke and Blender, even understanding the role of Houdini and how that works.
If she does that - any company will be lucky to have her.
Another massive benefit is composability. If the model can generate a cup and a table, it also knows how to generate a cup on a table.
Think of all the complex gears and machine parts this could generate in the blink of an eye, while being relevant to the project - rotated and positioned exact where you want it. Very similar to how GitHub Copilot works.