| I’d like to share my thoughts as someone who uses Python and Claude Code on a daily basis (I’ve been running a research codebase and trading bot for several months). I generally agree with the comment that “architecture is the bottleneck,” but based on my own experience, I’d like to elaborate further. I don’t think the issue lies in code generation capabilities. The code generated by LLMs is competent on its own; the real bottleneck is cross-cutting consistency, which I believe is the primary challenge for applications on the scale of Photoshop. For example, when I had Claude perform the task of “adding a new order type” to my trading bot:
-Implementation in the relevant file: 90% success on the first try
-Compatibility fixes on the backtesting engine side: 60% success with no oversights
-Cross-cutting concerns like logging, metrics, and notifications: 40% of these were missed The missed parts pass both compilation and testing. I’ve experienced the most troublesome kind of failure: the code is broken in terms of specifications but cannot be detected mechanically. Photoshop has an estimated tens of thousands of cross-cutting invariants. Every these tools must operate without conflict across all layer types, selection ranges, and color modes. However, reconciling all of this with a single LLM inference seems impossible with the current architecture. In other words, the absence of a “vibecoded Photoshop” isn’t due to a superficial lack of capability in the LLM; rather, the current context window and attention mechanisms are structurally unsuited for maintaining global invariants. This may not be the kind of problem that can be solved simply by “scaling up.” Conversely, the direction of “personalized bespoke small apps” pointed out by stevex has fewer cross-cutting invariants (since the functionality is localized) and aligns with areas where AI excels. My personal conclusion is that Photoshop and AI development are not competing; they are simply solving different problems. Since these observations are based on Python-based projects, this cross-cutting failure pattern might be less pronounced in statically-checked languages like Java or Rust. I’d like to hear others’ observations on this. |