Hacker News new | ask | show | jobs
by nickelbox 3087 days ago
It sounds like it creates a mapping from lines to "bugginess." How much value do you think there would be in some sort of semantic analysis, e.g. "new function foo is suspect bc it calls bar, which has shown up in a lot of stack traces lately"?
2 comments

> How much value do you think there would be in some sort of semantic analysis

Disclaimer: I'm the founder of GitSense (https://gitsense.com), which is also focused on predictive defect analysis, among other things.

Incorporating semantic analysis, by cross referencing semantic code changes with bug reports/static analysis report/stack trace reports/etc., will be absolutely critical for defect analysis and automatic code generation, in my opinion. It's also not trivial, both from a computation, storage and retrieval perspective. For example, running semantic diffs analysis on every revision (on any possible branch) and cross referencing the results with external data like bug reports, continous integration results, etc. is a very expensive/complex operation.

In order for ML to work, you need good datasets and in order to generate good datasets, you need lots of raw data (static analysis results, code change history, etc.) that can be data mined and cross-referenced, to produce meaningful data. Creating good datasets for ML, in a scalable manner, will require you to rethink how to extract, store and retrieve code related information.

With GitSense, it's designed to be installed on every developer desktop/workstation, which is how I solve the computation problem. Since every developer workstation is designed to be a continuous indexing machine, indexing can be distributed across dozens, if not hundreds or thousands of machines. Being able to index and cross reference as fast a possible, is absolutely critical, since the goal is to prevent developer mistakes from happening.

Generating semantic analysis is fairly straight forward. Incorporating it, is where the challenge lies.

Hey nickelbox,

This is among the several problems that I'm trying to solve. As a developer, when you're calling an existing function -- you don't really have any data at your disposal WRT the quality of that function. Likewise for modifying individual lines. To me, it seems obvious to want this information.

But I'm trying to test that hypothesis and see if other developers feel the same way.

Thanks,