Hacker News new | ask | show | jobs
by LogicFailsMe 103 days ago
OldAF. I have more ideas than I have time to code up prototypes. Claude code has changed all that, And given it cannot improve the performance of optimized code I've written so far, it's like having a never tiring eager junior engineer to work out how to make use of frameworks and APIs to deploy my code.

A year ago, cursor was flummoxed by simple things Claude code navigates with ease. But there are still corner cases where it hallucinates on the strangest seemingly obvious things. I'm working on getting it to write code to make what's going on in front of its face more visible to it currently.

I guess it's a question of where you find joy in life. I find no joy in frameworks and APIs. I find it entirely in doing the impossible out of sample things for which these agents are not competitive yet.

I will even say IMO AI coding agents are the coolest thing I've seen since I saw the first cut of cuda 20 years ago. And I expect the same level of belligerence and resistance to it that I saw deployed against cuda. People hate change by and large.

1 comments

Can you elaborate on "resistance against cuda"? What were people clinging to instead?
IMO it was mostly that people didn't want to rewrite (and maintain) their code for a new proprietary programming model they were unfamiliar with. People also didn't want to invest in hardware that could only run code written in CUDA.

Lots of people wanted (and Intel tried to sell, somewhat succesfully) something they could just plug-and-play and just run the parallel implementations they'd already written for supercomputes using x86. It seemed easier. Why invest all of this effort into CUDA when Intel are going to come and make your current code work just as fast as this strange CUDA stuff in a year or two.

Deep learning is quite different from the earlier uses of CUDA. Those use cases were often massive, often old, FORTRAN programs where to get things running well you had to write many separate kernels targeting each bit. And it all had to be on there to avoid expensive copies between GPU and CPU, and early CUDA was a lot less programmable than it is now, with huge performance penalties for relatively small "mistakes". Also many of your key contributers are scientists rather than profressional programmers who see programming as getting in the way of doing what they acutally want to do. They don't want to spend time completely rewriting their applications and optimizing CUDA kernels, they want to keep on with their incremental modifications to existing codebases.

Then deep learning came along and researchers were already using frameworks (Lua Torch, Caffe, Theano). The framework authors only had to support the few operations required to get Convnets working very fast on GPUs, and it was minimal effort for researchers to run. It grew a lot from there, but going from "nothing" to "most people can run their Convnet research" on GPUs was much eaiser for these frameworks than it was for any large traditional HPC scientific application.

Thanks!

It seems funny though: The advantages of GPGPU are so obvious and unambiguous compared to AI. But then again, with every new technology you probably also had management pushing to use technology_a for <enter something inappropriate for technology_a>.

Like in a few decades when the way we work with AI has matured and become completely normal it might be hard to imagine why people nowadays questioned its use. But they won't know about the million stupid uses of AI we're confronted with every day :)

> The advantages of GPGPU are so obvious and unambiguous

I remember being a bit surprised when I started reading about GPUs being tasked with processes that weren't what we'd previously understood to be their role (way before I heard of CUDA). For some reason that I don't recall, I was thinking about that moment in tech just the other day.

It wasn't always obvious that the earth rotated around the sun. Or that using a mouse would be a standard for computing. Knowledge is built. We're pretty lucky to stand atop the giants who came before us.

I didn't know about CUDA until however many years ago. Definitely didn't know how early it began. Definitely didn't know there was pushback when it was introduced. Interesting stuff.

I'm dealing with someone in 2026 insisting that everything has to be written in Python and rely on entirely torch.compile for acceleration rather than any bespoke GPU kernels. Times change, people don't.
The completely low information and amateur hour aspect of what our HPC Welfare Queens were pushing above was that a couple hours invested into coding Intel's Xeon Phi alternative to GPUs demonstrated the folly of their BS "recompile and run" strategy and any attempt to code the thing exposed how much better a design CUDA was than their series of APIs of The Month that followed*. And I was all but blacklisted by the HPC community over standing up to this and insisting on CUDA or I walk, my favorite quote was "You lack vision and you probably wouldn't have backed the Apollo program or Lewis and Clark." Good times, good times...

*But TBF Xeon Phi was not a complete disaster for if you coded it in assembler you could squeeze out Fermi class GPU performance. Good luck getting the "recompile and run" crowd to do that though as they segued from that to relying on compiler directives going forward and that's how NVDA got a decade+ headstart that should never have happened, but did. Today a lot of these sorts are insisting that because of autograd, everything should be written in Python and compiled with an autograd DSL like torch. I am so glad I am close to retirement on that front. I already trust coding agents more than I trust this mindset.

Phi was cool, I think it could have been leveraged into something great. Imagine all consumer CPUs coming with 512 little pentiums in them or something like that.
And ahead of GPUs in some ways at the time. But that was entirely squandered by their idiotic recompile and run marketing. There was some serious denial that thread blocks that could synchronize without thunking back to the CPU along with the intuitive nature of warp programming were pretty much a hardware mode against anything that couldn't do the equivalent.

But good luck explaining that to technical leaders who hadn't written a line of code in over a decade and yet somehow were in charge of things. People really need to consider the backstory here if they want to do better going forward, but I don't think they will. I think history is going to rhyme again.

In the beginning, valid claims of 100x to 1,000x for genuine workloads due to HW level advances enabled by CUDA were denied stating that this ignored CPU and memory copy overhead, or it was only being measure relative to single core code etc. No amount of evidence to the contrary was sufficient for a lot of people who should have known better. And even if they believed the speedups, they were the same ones saying Intel would destroy them with their roadmap. I was there. I rolled my eyes every single time but then AI happened and most of them (but not all of them) denied ever spouting such gibberish.

Won't name names anymore, it really doesn't matter. But I feel the same way about people still characterizing LLMs as stochastic parrots and glorified autocomplete as I feel about certain CPU luminaries (won't name names) continuing to state that GPUs are bad because they were designed for gaming. Neither sorts are keeping up with how fast things change.