| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danShumway 813 days ago

> To achieve this, we first instrument the test using sys.settrace (or, on versions of python >3.12, the far better sys.monitoring!) to keep a history of all the functions that were called, along with the calling line numbers. We then re-run the test and use AST parsing to find all the variable assignments and keep track of those changes over time. We also use AST parsing to obtain the source code for these functions.

I don't want to be negative on someone's Show HN post, but it seems like getting all of this and showing it to the user would be way more helpful than showing it to the LLM?

My standard sometimes when I'm thinking about this kind of stuff is "would I want this if the LLM was swapped out for an actual human?" So would I want a service that gets all this useful information, then hands it off to a Python coder (even a very good Python coder) with no other real context about the overall project, and then I had to ask them why my test broke instead of being able to look at the info myself? I don't think I'd want that. I've worked with co-workers who I really respect; I still don't want to do remote debugging with them over Slack, I want to be able to see the data myself.

Going through a middleperson to find out which code paths my code has hit will nearly always be slower than just showing me the full list of every code path my code just hit. Of course I want filtering and search and all of that, but I want those as ways of filtering the data, not ways of controlling access to the data.

It feels like you've made something really useful -- an omniscient debugger that tracks state changes over time -- and then you've hooked it up to something that would make it considerably less useful. I've done debugging with state libraries like Redux where I can track changes to data over time, it makes debugging way easier. It's great, it changes my relationship to how I think about code. So it's genuinely super cool to be able to use something like that in other situations. But at no point have I ever thought while using a state tracking tool, "I wish I had to have a conversation with this thing in order to get access to the timeline."

Again, I don't want to be too negative. AI is all the hotness so I guess if you can pump all of that data into an LLM there's no reason not to since it'll generate more attention for the project. But it might not be a bad idea to also allow straight querying of the data passed to the LLM and data export that could be used to build more visual, user-controlled tools.

Just opinion me, feel free to disregard.

2 comments

skydhash 813 days ago

Not wanting to be negative too, but I’ve used debuggers like gdb, the ones in JetBrains’s IDEs, XCode’s, and every time it’s not lack of information that’s stopping me from solving the issue. Coding common lisp with sly and emacs or smalltalk with pharo is much entertaining than chatting with an LLM. Coding with a good debugger is very close to that (even the one inside the browser for JavaScript). I think we can design better tools than hook everything to a LLM that requires 128gb of ram to run locally.

link

kvptkr 813 days ago

Interesting! I think you’re right in saying that the middleman you’re talking about has to be really good for something like this to actually be useful, especially for people very comfortable with debugging tools + their codebase. If I understand correctly, you’re saying that the most productive tool for you would be one that can present you with more relevant data, in a structured way (Redux, etc). At first, we actually did think of making a nice IDE with all that data, but found out it kind of already exists - https://pytrace.com/, and we found it to be more cumbersome to use than anything! Our belief is that these tools can help, but there’s an irreducible amount of reasoning that needs to happen, which takes time and effort, and we think a tool like this might be able to reduce that by offloading the reasoning to LLMs. I guess what I’m saying is that there’s a cap on how much useful information a tool can give a developer to enable them to reason better, and I’m really interested in seeing if it’s possible to reduce how much reasoning is needed in the first place. Curious to hear what you think - thanks for the thoughtful comment!

link

danShumway 813 days ago

:shrug: I'm one person on the Internet, so if using the LLM makes it work better for more of your users, go with that.

I do think data filtering and visualization is an important value add, Pytrace looks cool, but it doesn't look much different from what I get if I debug Javascript in a web browser, so I think there's a ton of room to improve. Visual representations of data transformations and code paths are a relatively unexplored area across the entire software industry imo.

If an LLM could flat-out reduce my need to reason and fix the bug for me, great -- but I've worked with coders that I respect a lot, and remote debugging with them has always been a pain and made narrowing down issues harder. I've never enjoyed debugging something remotely where I was working through a middleperson and couldn't look at what was going on; it's helpful to have multiple eyes on the code, but not if I have to use someone else's eyes to look at what's happening.

So for me, in order to successfully reduce the amount of reasoning I need to do and overcome the downside of me not being able to visualize the timeline/data, the LLM would need to be better at fixing these bugs than professional developers in industry: developers who are already intimately familiar with the codebases I'm debugging because they wrote a significant portion of the code. It would need to be better at coding than professional humans. And I just don't think there's anyone who would say that GPT-4 is close to that level yet.

What I could see is, maybe -- if I have access to that data, and the LLM is just kind of on-the-side, maybe at that point it can offer helpful advice and there wouldn't be a downside because I would still be able to debug as fast as I can using all of the available data, and if the LLM can occasionally find something I missed, then great. Peer-debugging sessions with multiple coders are great, so at least in theory I could see some value from an LLM on that stuff, even if I'm a little skeptical about potential performance. And if the LLM wasn't in front of the entire data, if it didn't work then no worries, the data is still there.

But again, if people like it, then it doesn't matter what I think. Why I wouldn't use the tool is less important than why someone would use the tool, and if integration with the LLM makes people want to use the tool, then... I mean, not everyone has identical work styles. Different things might work for different people.

link