|
|
|
|
|
by habitue
4490 days ago
|
|
I'd say you had it right the first time with the list comprehension. List comprehensions in python are way faster than while/append loops because they're implemented in C. The issue is that you had the "not in" which is O(n) over a list (making the comprehension O(n^2) ). If the documents are hashable, I'd suggest making results a set. "not in" is constant time on a set. If a document is something not hashable, you can make results a dictionary and get the same constant time access. In python function/method calls are expensive, so avoid them at all costs inside tight loops. |
|
Now, let's compare with the obvious non-list-comprehension version:
This generates some similar bytecode: The main difference here is that there's a lot more loading and saving of variables (e.g. the "docs" variable which was implicit in the list comprehension version), and things like the docs.append() method have to be loaded each time through the loop and then called, which slows things down some compared to using a single LIST_APPEND opcode.I'm not really going anywhere with this; I just think that interpreter internals are fun.