One main thought on this topic -- languages like Haskell and lisp also have very poor support for direct memory control, but tend to be viewed (perhaps untruthfully?) as much closer in performance to C than Python/Ruby.
Haskell and languages in the ML family have a lot of opportunities for elaborate static analysis, which often allows the resulting programs to be quite clever about optimizing the resulting programs. As one example, the GHC Haskell compiler uses loop fusion to combine multiple passes over a list into a single pass with no intermediate copies of the list produced. Consequently, Haskell code like
map f (map g (map h someList))
is going to involve allocating exactly one list of the same size as someList, while a direct translation into Python
map(f, map(g, map(h, someList)))
is going to involve the creation of several intermediate lists.
Most of the Haskell optimizations are aimed at making high-level code faster, not absolute speed. Something like loop fusion is trivial to do by hand, so it doesn't confer any advantage on hand-optimized Haskell programs. And when you're comparing to C, you're in the realm of performance where hand-optimizing code at the expense of simplicity, maintainability or even correctness is necessary. Otherwise you wouldn't have considered C.
So high-level optimizations like loop fusion do matter, but only for the 95% of the code which is not entirely performance critical. It's great for letting you write pretty (and therefore easier to write and more maintainable) code but not great for squeezing out all the possible performance in an inner loop.
Considering that many GHC optimizations are of this nature, it's really impressive that GHC is still good at speed in an absolute sense as well. So if you desperately need to wring out an extra bit of performance, you can still do it in Haskell instead of having to use C. Sure, the highly optimized Haskell will be relatively ugly, but it's still better than C and much easier to integrate with the rest of your codebase.
In the Haskell case you could also do something like this to avoid needing that optimisation:
map (f . g . h) someList
And in Python 3, map returns an iterator, not a list, so you aren't building the full list until you ask for it, and you never build intermediate lists in your example. You can do the same thing in Python 2 with the itertools.imap function.
`map` is the trivial case. There are plenty of loop compositions that are completely non-trivial to do by hand. That's why array fusion (see e.g. repa or vector) is a huge win.
foldl g . scanl y . concatMap x . filter h . unfoldr k
Fuse that by hand.
This is why we have optimizing compilers. They do what you could have done, only more often, and without mistakes.
Using generators in Python will get similar laziness in Python:
from itertools import imap
map(f, imap(g, imap(h, someList)))
I think Python 3's map built-in is a generator so you no longer have to use the itertools module.
Unfortunately we don't have . or currying in Python so no pointfree python :(.
from itertools import ifilter
# ugly python function with a "Maybe dict" return type
def query(data, date):
"""Return the first dict where date is > x['date'] or None"""
is_greater = lambda x: date > x['date']
return next(
ifilter(
is_greater,
data
),
None
)
No, Haskell really does have significantly better performance than Python or Ruby. Same with some Lisp variants as well--there was even a research whole-program optimizing compiler for Scheme that apparently sometimes beat hand-optimized C (Stalin Scheme).
So one way to improve performance is simply by having a good compiler. And GHC, at least, is a very good compiler.
Also, Haskell support for direct memory control is not that bad. In fact, in some ways, it's better than even Java--you can have your own unboxed data types which essentially act like structs, for example. This also means that you can have unboxed arrays of more than just primitive types.
Haskell also does some very clever things with both the heap and the stack, but I'm not familiar enough with its internals to comment. My understanding is that Haskell makes heap allocation much cheaper and has a GC optimized for handling lots of small allocations (as you would expect for a functional language).
Ultimately, the point is that the question is fairly nuanced and you won't be able to pin down a single language or implementation feature that uniquely determines performance.