Hacker News new | ask | show | jobs
by ilaksh 797 days ago
I think the meat of this is the text-to-image model. I hope you will upgrade to use leading edge models like DALLE-3 or Imagen 2 or SD 3 (when available) if you are not already.

That will dramatically increase the effectiveness of portraying the given vision for the virtual artist if they are using a prior model.

1 comments

The text-to-image model is an important component, but the current model in use is IMO good enough. My view for this project is that the internal monologue is more important than the output, so my wish is instead for a better open-weight LLM.
Which text to image and LLM models are you using?
LLM: Mixtral-8x7B text-to-image: one of the leading commercial models, whose TOS I may or may not be violating.
Mixtral is great. I assume you saw the DBRX and new larger Mixtral release that just came out over the last few days.
I did! I want to switch to Mixtral-8x22B, time permitting. During the development of Stream of Consciousness I already swapped LLMs twice. This space is moving incredibly fast.