Hacker News new | ask | show | jobs
by HanClinto 809 days ago
> I am planning on doing some more articles/director commentary as it goes along.

Speaking for myself, I expect that the behind-the-scenes commentary would be the most interesting part of the project!

> The "I'm a GPT that wants everyone to be friends and how" is increasingly better in those episodes.

How long does the pipeline take to run? (apologies if this was part of the blog series and I missed it). Depending on how close to a self-running CI pipeline the whole process is at, I think it might be interesting to run benchmarks against various versions of the pipeline and evaluate its performance at each stage. I feel like I could evaluate the improvement of the "let's make everyone be friends!" writing if I'm comparing Episode 1 (compiled w/ v0.3) against Episode 1 (compiled w/ v0.8), instead of Episode 1 vs. Episode 12.

Crazy idea: If one could somehow quantify the quality of consistency, dialogue, camera work, etc -- then you may be able to watch numbers-go-up in an actual graph sort of way (I'm imagining a multi-agent system where various agents are responsible for monitoring various aspects of script and production quality -- almost like an actor/critic setup).

But at the very least, being able to A/B comparison between v0.3 and v0.6 could be very interesting for people interested in the internals.

> I've taken it about as far as I can on a solo basis. The next step is a team of 4-5 people levelling it up. Every piece could be 10x better, and it would be a different beast entirely if that happened. I think there are some super exciting directions this could go.

I think that's the really cool thing about what you've built here -- it's a complete pipeline, and every piece is present -- even if the pieces aren't in their final form, the fact that you've pieced together an entire pipeline is extremely compelling.

> (PS - Hi Han!)

Hi!! It was a very cool surprise to see your name pop up on my HN feed this morning. :D

1 comments

I agree, BTS is definitely very interesting.

But I had 8 kids 5-15 watch all of Ep1 _AND_ choose to watch Ep2 afterwards last night. They actually sat and watched, too, instead of having it on in the background... AND they were bummed they couldn't watch the super secret pilot episode (which has MAJOR audio issues - I couldn't bring myself to inflict it on them).

So I think something is there.

I agree, there are some great opportunities to track things somewhat more quantitatively. It takes ~15 minutes and $10 bucks to generate a script depending on how fast OpenAI is feeling. So in a real scale v2 it would be very reasonable to explore this.

Man, I sure hope I get to build this further!

> So I think something is there.

Yes, I think so! That's super encouraging about holding the attention of a room of kids!

> It takes ~15 minutes and $10 bucks to generate a script depending on how fast OpenAI is feeling. So in a real scale v2 it would be very reasonable to explore this.

Yeah -- still a bit large to truly put into a CI pipeline that is running against every commit tho. :-/

Do you mind sharing your context window size? I always want to use local LLMs for rapid iteration -- I think 32k window isn't too difficult (Mixtral supports this out of the box, I think?), but I've heard of people pushing 100k tokens locally. Even so, that's peanuts compared to what hosted LLMs are doing, and if quality of writing is your bottleneck, then you wouldn't want to stray too far away from GPT-4 / Claude.

> Man, I sure hope I get to build this further!

Yeah!! It really feels like you've latched onto a nugget of something here, and I'm excited to see what's next!