Hacker News new | ask | show | jobs
by PeterisP 4860 days ago
On the other hand, why should the abstraction layers prevent that? I mean, abstraction layers abstract away the [hopefully] unimportant low-level choices from me - but "copy or not copy" or "allocate once or allocate thrice" isn't a choice that I need to make anyway; the abstraction layer simply should make the 'non-copy' choice for me. Exactly the same way that the C abstraction layer right now makes the proper opcode-ordering choices for me as good (or better) than I can do manually in assembler.

The problem is that we haven't yet implemented those abstraction layers in this smart way - for example, Haskell can implement 'fusion' of multiple string operations so that they are merged together and executed without intermediate copies; and the abstraction layer for that is exactly as high-level as the Python examples in original poster's slides. Sure, it's objectively hard to change core Python like that - but it theoretically can be done, so it should&will be done.

1 comments

its not always possible to go with the "non-copy" choice. for example, there are very good reasons for having immutable strings, and once you've made that choice at the language level every string function you write is going to copy at least once.

I think Alex Gaynor is correct and that basically what is wrong at the moment is that dynamic languages lack API's that have any sensitivity to performance concerns. There's always going to be a hard limit based on the nature of using a JIT vs. a static multi-pass compiler. There's always going to be a hard limit based on fundamental language choices (implementations of primitives, mutable vs. immutable strings, amount of overhead in object instantation, etc.) But we're nowhere near those limits right now.

Compilers can work around that choice - if you want to do something with immutable strings (like the Haskell example I mentioned, it does have immutable strings) then you will have to make some copy, but you don't need to make a copy per every function - if you're stringing three string functions in a row, the compiler can "fuse" the processing so that only a single, final copy is made, not the intermediate ones.

For any language the compiler may know which variables won't ever be used - for example in pseudocode

  b = a.lowercase()
  c = b.replace("x","y")
  d = a.lowercase.replace("x","y")
both 'b' and the intermediate result in 'd' are strings, but the compiler can flag these two 'throw-away' variables as mutable strings (while still maintaining the promise that all programmer-visible strings will be immutable); and you may have a special version of 'replace' standard function that does no-copy, in-place replacement in such cases. It means extra work in building API/stdlib, but brings better performance for the same programs.