Hacker News new | ask | show | jobs
by fouc 180 days ago
Pretty good, I've noticed the animation tends to veer off / hallucinate quite a lot near the end. It is clear that the model is not maintaining any awareness of the first image. I wonder if there's a way to keep the original model in the context, or add original image back in at the half way mark.
2 comments

Thank you. I've noticed that too, and also that it has a tendency to introduce garbled text when not given a prompt (or a short one).

This is using the default parameters for the ComfyUI workflow (including a negative prompt written in Chinese), so there is a lot of room for adjustments.

Oh I was wondering why some of the hallucinations introduced Chinese text/visuals, I'm guessing that might be due to the negative prompt.
I think the main reason is that the model has a lot of training material with Chinese text in it (I'm assuming, since the research group who released it is from China), but having the negative prompt in Chinese might also play a role.

What I've found interesting so far is that sometimes the image plays a big part in the final video, but other times it gets discarded almost immediately after the first few frames. It really depends on the prompt, so prompt engineering is (at least for this model) even more important than I expected. I'm now thinking of adding a 'system' positive prompt and appending the user prompt to it.

Would be interesting to see how much a good "system"/server-side prompt could improve things. I noticed some animations kept the same sketch style even without specifying that in the prompt.
Could do something funky like convert it to grayscale, add a 4th "colour" channel and put the grayscale image it that
I'm actually trying to reduce the 'funkyness', initially the idea was to start from a child's sketch and bring it to life (so kids can safely use it as part of an exhibit at an art festival) :)

There's a world of possibilities though, I hadn't even thought of combining color channels.

I think they were suggesting that it might be possible to inject the initial sketch into every image/frame such that the model will see it but not the end user. Like a form of steganography which might potentially improve the ability of the model to match the original style of the sketch.