| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by DinoV 4977 days ago

We don't actually execute any code to figure this out, it's all based upon analysis of the code.

Basically how it works is that we do an abstract interpretation of the code. To take this specific example first we analyze the a = {2:42} line, and the dictionary literal produces a new unique value in the system, and obviously we know it's key/value types. From there we'll analyze the "a.values" which will produce a new value in the system for the bound "values" method. Then we do an abstract call on that, and it says that it returns a list (and goes off to the original dictionary value, and produces a list which contains the int objects). And then we do the indexing, and finally we get back the original 42 value that the dictionary was created with. So basically at each step along the way we are just propagating sets of abstract values. A naive implementation of a system like this would iterate until it hits a fixed point - we're a little more sophisticated in that we do a bunch of dependency tracking to know what needs to be analyzed based upon changes in the system. And then we also need to deal with a user editing the program in real time which makes things even more complicated as we need to discard old stale information which has been propagated from edited files. The actual implementation of the basic Python semantics is actually the easy part, it's getting this to perform on large programs and working correctly with live editing which is the difficult part.

This can go wrong in a couple of ways. First off we don't model control flow, so you could have something like:

if False: a = 42 else: a = 'bar'

And we'd think that a was an int or a bar. In the future we might model control flow, but it's a big feature. Mainly I just need an excuse to implement it :). Another way it could go wrong is if the user is using exec/eval which we don't have any insight into. And finally there can be things that we simply don't understand. So while we know about the primitive types and the methods on them, we don't model all of Python's built-in types. So for example if you're using deque in collections you won't get the same results.

All of this is open source (under the Apache license) and the parser and analysis engine are entirely stand alone components, so they could even be re-used by other IDEs. You can check out the source code over here: http://pytools.codeplex.com/SourceControl/changeset/view/3b8...

1 comments

sho_hn 4977 days ago

That's a nice example and good explanation. We're about in the same spot over at KDevelop and call a an "unsure (int, string)" there: http://simplest-image-hosting.net/png-0-n12983

Things like incremental reparsing on edits are handled by KDevelop's DUChain framework, which is shared among the language plugins. The Python language plugin actually reuses the original CPython parser to do the job, and then plugs CPython's AST into KDevelop's DUChain representation.

And just to make me feel less like a thread hijacker, let me add that we've been quite impressed by the work the PTVS team has been doing. You're doing great things for the Python community over there :).

link

DinoV 4977 days ago

Thanks, you're making me want to check out KDevelop to compare now. So far I thought we had the best Python intellisense hands down, but it sounds like you're a pretty good competitor :)

link