Hacker News new | ask | show | jobs
by int0x80 1405 days ago
Something very close to this can be done and it's in fact done already by static analysis. Static analysis doesn't have any Turing complete problem. Static analysis has a limitation of providing complete answers because of the halting problem, but it can provide instead sound answers. That is, static analysis can provide all possible runtime types for any given (e.g. python) expression. This can be accomplished by doing Abstract Interpretation, or Constraint Analysis and Dataflow Analysis (kCFA), which is in way similar to what you suggest of running the program in a profiler, but instead it runs the program in an abstract value domain. With static analysis you will get some imprecisions (false positives) but no false negatives, so it can effectively be very useful for a developer in an IDE. The precision (amount of FPs) and performance are mutually dependent, but good performance and reasonable (read useful) precision can be achieved, although it's a non-trivial engineering problem.

Additionally, some programs cannot be easily and/or quickly executed to cover all possible paths, so parts of the program will remain uncovered by the profiler. That is one place where static analysis becomes very powerful, because you can cover all the program much faster (in linear time making largish precision tradeoffs, but analysis still yielding a usable result).

source: I work on a flagship static analyzer.

2 comments

Indeed, Abstract Interpretation is powerful and beautiful as heck. Type Systems can be seen as just a special case of general Abstract Interpretation, where the types are the abstract values computed from each expression. The problem is that it requires ingenuity to devise abstract value systems that don't degrade into useless generality as unknown branches in the code 'execute'. Even types only succeed as much as they do because they require extensive and invasive (language-design-changing as well as program-changing) cooperation from both the language designer(s)\implementer(s), as well as the language developer(s), without this cooperation you will leave open very wide gaps. Symbolic Execution is another much much more powerful technique but, predictably, also very inefficient in general.

My way of looking at this is just frustration at the misallocation of resources. Dynamic programs already have tons of typing information available, just not in the right time and place. Your brain spends an aweful lot of time "executing" the code of a dynamic language while it's writing the code, just like the language runtime only far more slowly and with a sky high probability of error. So all what I'm saying is, why this immense suffering, when the information that your brain is in dire need of is already right there in the interpreter's data structures, just in opaque and execution-temporary forms ? It strikes me as incredibly obvious to try to bring in those typing information from the runtime into persistent transparent records available at dev-time.

Abstract Interpretation or Symbolic Executation are (fancy) tools in programming language theory's toolbox that can help us, but the simplest possible thing to try first is to make use of the already-available information that are just lying around elsewhere. To make a somewhat forced analogy : if we have a food crisis at hand, we could try fancy solutions like genetic engineering of super crops or hydroponic farming, but the dumbest possible solution to try first would be to simply bring in food from other places where its plentiful. Typing info in dynamic languages is very plentiful, it's just not there when we need it.

The problem is what I mentioned at the end of my other comment -- complex programs will get inputs from web APIs, console, database, so it's not possible to have complete runtime coverage for all paths in the interpreter on the general case.
do you know any textbooks on abstract interpretation?
"Principles of Program Analysis" covers the subject very well. It's not an easy or beginner book because it's very formal, but very complete and with plenty of examples.