| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by auszeph 61 days ago

Built similar for internal use at our work. Slack+JIRA though, not Linear. Otherwise GCP-native like this.

I didn't want to be on the hook for supporting an open source version though, so never made it public. Good on you for putting it out there.

A few differences I can quickly spot, fwiw...

I went with Firestore over Postgres for the lower cost, and use Cloud Tasks for "free" deduping of webhooks. Each webhooks is validated, translated, and created as an instant Cloud Task. They get deduped by ID.

We see a lot of value in a scheduler. So running a prompt on a schedule - good for things like status reports, or auto log reading/debug.

I prefer to put my PEMs in to KMS instead of Secret Manager. You can still sign things but without having to expose the actual private key where it can be snooped on.

I run the actual jobs on spot VMs using an image baked by Packer with all the tooling needed. You don't run in to time/resource limits running them as Cloud Run jobs?

1 comments

yzhong94 61 days ago

Haha we are definitely like-minded because our internal Broccoli is actually on Firestore. That being said, Firestore is an acquired taste so we rewrote the OSS backend to Postgres so that everything can be deployed in one go with the infra that people are most familiar with.

Re: spot VMs. Great idea! There are two features we have not finished porting to OSS. Internally, we can specify the instance type and timeout, and we also send about 50% of jobs to Blaxel; we find it has a much better cold start compared to Cloud Run. We probably will port the multi-vendor support logic over to OSS soon but wanted to keep v1 simple (and a one-provider magic experience!).

Scheduler is a wish item for us. Curious how you implemented it? Currently, we just have a scheduled Cloud Function during the night to automatically address open PR comments (via the Broccoli GitHub feedback automation) so that the engineer wakes up to a mostly clean PR without needing to do anything. We haven't ported this to the OSS yet because 1) Firebase Cloud Functions, 2) not sure what would be the best ergonomics. Any suggestions here?

auszeph 61 days ago

Ours currently runs with Cloud Tasks, which involves some cleanup handling if one run fails to enqueue the next.

Originally I had Cloud Scheduler running a heartbeat task every X mins, and the one of the heartbeat tasks was to look for any overdue scheduled tasks and fire them off. So they were not very precise in timing, but a very simple setup.

I made the move to Cloud Tasks so I could heartbeat less often. Now the cleanup happens in the heartbeat - ensure all scheduled tasks have a matching cloud task pending.

Feedback on PRs was an interesting challenge - since we can get it from Slack replies, Github comments, CI failures and we want to be fairly reactive. I ended up leaning on Firestore realtime queries, the harness on the agent VM is subscribed and can interrupt the agentic loop to feed in new feedback as it comes in. All gets very complicated to OSS, but it has helped to get quicker feedback loops going.