|
|
|
|
|
by csense
4862 days ago
|
|
The example he gives for strings could be optimized to near the efficiency of the C version by a sufficiently smart compiler: int(s.split("-", 1)[1])
If the JIT knows that s is the builtin string type and the split() method has not been overridden [1], it can speed this up by using "pseudo-strings," where a pseudo-string is an index and length into another string. This would require only O(1) time and space.Garbage-collecting pseudo-strings would be an interesting exercise, but I'm sure it's a solvable problem [2] [3]. [1] If the preconditions for your optimization don't hold, you can always fall back to interpreting it. As noted by the speaker, this sort of logic is already a critical part of many JIT's including Pypy. [2] The problem is actually GC'ing the parent. When the parent string is gc'ed, you have to compact the orphan strings to reclaim the remaining space; otherwise it'll be possible to write user code that uses a small finite amount of memory in CPython but has an unbounded memory leak in your compiler. [3] You can avoid the trickiness in [2] if the parent string can be proven to outlive its children, which is the case in this example. You could probably optimize a lot of real-world code, and have an easier time implementing the compiler, if you only used pseudo-strings when they could be proven to be shorter-lived than the parent. As a bonus, this partial GC would build some infrastructure that could be recycled in a general implementation. |
|