Hacker News new | ask | show | jobs
by pdimitar 7 hours ago
> Does anyone think a startup with a good product is going to be materially disadvantaged by not having access to an incrementally better security focused LLM release?

- It's not "incrementally better". It's a complete game changer. Opus 4.8 on max thinking does X amount of mistakes in my commercial work. Fable 5 did 5% of X. Counted. I barely had anything to contribute in the work sessions, for a full week I could count on my two hands the total amount of times I actually caught Fable 5 -- and one part of those were not true mistakes, more like divergence from policy in our `CLAUDE.md` files.

- It's not "security focused". It's simply better in every way _plus_ it's also security-conscious.

- It legitimately accelerated my work. I don't have too much unknowns in my work, I simply have way too much to do. Fable 5 was an objective and measurable improvement over Opus 4.8. Returning to it after Fable 5 was removed was extremely discouraging and frustrating, and still is to some extent.

> It’s lots of fun to pretend it’s some step-change that’s too dangerous for general release

Maybe, but not as much fun as tearing down a straw man apparently. :)

> (Just to be clear, I think the gatekeeping is ridiculous, especially given the above)

It's ridiculous for multiple other reasons but ridiculous nonetheless.

2 comments

> I don't have too much unknowns in my work, I simply have way too much to do.

Interesting, I'm curious what work you do? My software engineering career has never been in that situation, it's always so much ambiguity and unknown that trumps everything.

Fair question, and I was vague just so as not to balloon the comment.

I work in a financial startup. The codebase is a mess and very much spaghettified. One rework that forced us to migrate our data model from 1:1 users<->loans to M:N (many-to-many) took two months and touched ~40% of the codebase... multiple times. Huge churn. And it just crossed two months of work, even though it's now in its very final phases.

I know what must I do:

- Introduce and enforce structs for passing context and input shapes around. So as to stop fighting with NULLs, lack of keys in maps and other maddening cases that inflate your coding lines for no other reason than programming languages not having higher-order constructs on well-researched and mostly resolved computer science problems (sigh; not going to rant here about that but it does tick me off how we are _all_ constantly reinventing the same wheels almost every day).

- Saga discipline: if step 6/9 in a pipeline fails, revert everything up to this point, even if it was touched by a 3rd party API.

- Compensation/undo steps. Including flagging / logging those that cannot be undone (sadly one part of our 3rd party APIs are like that).

- Introduce an universal runtime validator library that enforces contracts -- including conditional validation i.e. "only validate field Z if field X is present and is a positive integer and if field Y is present and is a valid UUID".

- Introduce runtime contracts / invariant enforcement.

- Introduce our own dynamic workflow engine, piggybacking off of a few free and unencumbered solutions in the language of choice's ecosystem.

...And these are just off the top of my head after I slept only 4.5h and woke up due to the heat. And each one of these can take from 2 to 6 weeks _even_ with Opus driving all coding and me reviewing and keeping it behaving within my policies and coding standards.

Me & Claude are maintaining a TODO list that is no smaller than 150 items at this point (though in fairness, at least 75% of them are fairly small and not architectural like the ones above).

I believe I know how to architect this thing but business customers and the CEO keep coming back with feature requests which of course always take priority.

When Fable 5 was around, for mere 4 workdays, I not only went ahead of my own schedule feature-work-wise but even had the bandwidth to start tackling a few other architectural decisions, tightened them up in `CLAUDE.md` and Fable even devised an opinionated AST linter for test discipline (disallow direct DB access in our tests, only go through the domain/context modules to do so). It helped me start turning the tide.

This all went out the window when I had to go back to Opus 4.8. It's still _very_ good, mind you, but it does feel like I am a special-education teacher periodically. It forgets disciplines we discussed and codified likely 15-20 times at this point, forgets important project context and attempts to reintroduce subtle bugs, and a few others.

My next game is, with or without Fable, to continue its work and just enrich the AST-based linters to convert the theoretical prompt-based guard-rails into actual LLM hooks and compiler / runtime-at-startup hooks so the agent cannot ignore them.

I don't enjoy harness engineering but the interesting and very positive effect has been that it helped me think more like an architect and less like a coding monkey, which I do hugely appreciate and only realized I was missing it for years after it actually started happening again.

Hope that helps put things in context.

Fable wasn’t available for a full week. It was released on June 9 and made unavailable June 12.
Okay, might have mistook 4 work days for 5.