| HN Mirror

I've been experimenting with running a few models for local inference, some of them get "stuck" in a repeat loop of trying the same thing endlessly, its weird. Others are really good. If they can ever handle about 400k tokens (maybe less, but from experience with Claude after the 1 million token increase this seemed to be a good sweet spot) without going batcrap crazy I'll be impressed, mostly because I would like them to read more of the codebase instead of just making assumptions. Although I've been building a custom harness, and I'm just about to start working on the tool building features for the harness. I already have a system similar to what Beads does but I didn't like some things about Beads so I made my own to track tasks, so context window doesnt need to be super massive for task tracking.