| HN Mirror

> How much value do you think there would be in some sort of semantic analysis

Disclaimer: I'm the founder of GitSense (https://gitsense.com), which is also focused on predictive defect analysis, among other things.

Incorporating semantic analysis, by cross referencing semantic code changes with bug reports/static analysis report/stack trace reports/etc., will be absolutely critical for defect analysis and automatic code generation, in my opinion. It's also not trivial, both from a computation, storage and retrieval perspective. For example, running semantic diffs analysis on every revision (on any possible branch) and cross referencing the results with external data like bug reports, continous integration results, etc. is a very expensive/complex operation.

In order for ML to work, you need good datasets and in order to generate good datasets, you need lots of raw data (static analysis results, code change history, etc.) that can be data mined and cross-referenced, to produce meaningful data. Creating good datasets for ML, in a scalable manner, will require you to rethink how to extract, store and retrieve code related information.

With GitSense, it's designed to be installed on every developer desktop/workstation, which is how I solve the computation problem. Since every developer workstation is designed to be a continuous indexing machine, indexing can be distributed across dozens, if not hundreds or thousands of machines. Being able to index and cross reference as fast a possible, is absolutely critical, since the goal is to prevent developer mistakes from happening.

Generating semantic analysis is fairly straight forward. Incorporating it, is where the challenge lies.