|
|
|
|
|
by idiliv
856 days ago
|
|
People here seem mostly impressed by the high resolution of these examples. Based on my experience doing research on Stable Diffusion, scaling up the resolution is the conceptually easy part that only requires larger models and more high-resolution training data. The hard part is semantic alignment with the prompt. Attempts to scale Stable Diffusion, like SDXL, have resulted only in marginally better prompt understanding (likely due to the continued reliance on CLIP prompt embeddings). So, the key question here is how well Sora does prompt alignment. |
|