Hacker News new | ask | show | jobs
by darkf 3434 days ago
>However, many are subjective preferences

Certainly, it is titled "Problems I Have" for a reason. :-) I do not expect everyone to agree with me, but it is what I feel I personally lack when using it quite a lot.

> I imagine this statement could offend some of the smart and hard-working people who are working on improving the python language.

That was certainly not my intention -- as stated, I do love the language and appreciate all work going into it. I do not intend to undermine their efforts, just point out some of my perceived design flaws.

>- the community does not agree with the author's subjective idea of what Python should look like

I think we all agree there should be a good solution to concurrency (and "stackless" variants which power eventlet, etc. have been used for ages; as has Twisted, of which asyncio is not a sufficient clone.), parallelism, etc.

The standard library in general encourages use of higher-order functions and concepts borrowed primarily from FPLs (see: comprehensions, map/reduce, sort, etc.) I could not imagine seeing them backtracking on this -- it only helps them to go further in that direction.

>- the solutions to a problem (I'm thinking GIL) come with a lot of consequences which are not readily acceptable

I did not propose a solution because there are many, as you note; there are, however, implementations with decent solutions like AFAIK Jython.

>- solving some of the issues would exacerbate backwards compatibility.

Such as what?

7 comments

> I did not propose a solution because there are many, as you note; there are, however, implementations with decent solutions like AFAIK Jython.

There are no solutions that satisfy everyone that I am aware of yet. Guido has said in the past that he'd be happy to get rid of the GIL, and would merge a patch that solves it, as long as:

* It does not reduce the performance of single-threaded Python code.

* It stays compatible with all existing pure Python code and C extensions.

But in practice, GIL is not that much of an issue for many types of applications where Python is popular.

* It's not an issue for web apps, because these are typically served from multiple physical servers each running multiple python processes. These do not share a GIL anyway, and "thread safety" is pushed to database transactions.

* It is not an issue for apps which spend most of the time doing I/O. Most IO libraries release the GIL, and other threads can run while you're waiting for results from the database.

* It is not an issue for data science doing heavy number crunching with numpy and everything built on top of numpy. Numpy releases the GIL while doing large computations in C.

* It is not an issue for small scripts, as a "better than Bash".

The GIL is only an issue for apps that do heavy computation in pure Python code, and need parallelism within a single process (socket servers? text data processing?). As a result, many Python users just don't find it a big enough problem to be worth solving, if the solution comes with downsides for their use cases.

The GIL is only a problem because there's "no free lunch" -- no single strategy that is best in all cases.
> It is not an issue for apps which spend most of the time doing I/O.

This is a common misconception that doesn't seem to be backed up by any data.

Dave Beazley did a number of performance tests with profiling, looking at GIL contention in a multi-core scenario: http://www.dabeaz.com/python/GIL.pdf

The results were that even IO-bound workloads still suffered because of the poor implementation of the GIL (details on slide 35 or so). This was an issue up until Python 3.2 (!) when a new GIL implementation was added, which he also profiled: http://www.dabeaz.com/python/NewGIL.pdf

Not backed up by data? Maybe because it's so easy to document that no one bothers to write about it?

Multithreaded IO-bound tasks don't care about the GIL.

Yeah the old implementations of the language were not as good as the latest. It doesn't seem right to criticize the language for problems that have already been fixed.

"IO-bound multithreading is fine" has been the Python mantra for the last 20 years. Lo and behold, someone actually gathers some data and comes to find out that is absolutely wrong. For the last few years they've had a revamped version of the GIL, but that still has a burden of proof that can only be validated by profiling real-world applications.

A community can't make flat-out invalid claims for two decades and then expect everyone to take them at their word that everything is fine now.

>>- the solutions to a problem (I'm thinking GIL) come with a lot of consequences which are not readily acceptable

> I did not propose a solution because there are many, as you note; there are, however, implementations with decent solutions like AFAIK Jython.

>>- solving some of the issues would exacerbate backwards compatibility.

> Such as what?

Let's use the GIL for example. The reason that is still present is not that it is hard to remove it. It already was removed in the past. The problem is that when GIL is replaced with smaller locks, the python becomes much slower, because of some features and behavior that people got used to.

One could make python faster by changing behavior, but then it would break existing code and C extensions.

There's no easy way to do this without sacrificing something else. Larry Hastings work on removing GIL and has interesting talk about it[1]

[1] https://www.youtube.com/watch?v=fgWUwQVoLHo

That talk was a great overview of the issue.

Most people who are ignorant about the subject always assume the GIL is stupid and useless, but the GIL allows Python to be extremely fast in single threaded scenarios, and any attempt to remove it introduces at least 20% slow downs.

And C extension support is also a huge factor as you mention. All of Python's scientific modules would be lost if they broke that.

The author assumes these are simple problems that can be solved but aren't because of politics or incompetence, but some of the smartest mind have attempted and failed.

I'd like the author to try and come up to solutions or at least draft ideas for how each of his points can be fixed. A lot of those are easy to state but difficult to solve without breaking more stuff.

  > with decent solutions like AFAIK Jython.

  >>- solving some of the issues would exacerbate backwards compatibility. Such as what?
Jython can't load native C extensions which should be GIL aware. Most programs, and the python interpreter itself, aren't thread safe so suddenly removing the GIL would break a lot of programs.

I agree with you that the concurrency story for python sucks, but claiming solutions could exist without breaking back-compat is just not right.

FWIW, the folks working on TruffleRuby have done some amazing things on this front - essentially making an interpreter for Ruby C extensions that JITs to code that's more-or-less identical in performance to the original native version.
Is Jython good though?

Last I used it is had issues keeping pace, i.e. demonstrable memory issue that took a long time to fix, lagged considerably behind python 2/3 versions.

It's also worth noting, that as an essentially transcompiled language, you need to have a good appreciation of Java machinery, in which case languages like Groovy provide good competition.

When talking about JVM languages suitable for building systems, Jython isn't generally mentioned for the reasons you give, nor is Apache Groovy. Those two are good for scripting, e.g. testing Java classes, build scripts, glue code. Besides Java, languages like Clojure, Scala, and Kotlin are usually considered as systems languages on the JVM.
True, but I believe Groovy and Jython compete for the same space, if not system-building.
>but claiming solutions could exist without breaking back-compat is just not right.

I mean, you have a good point on C extensions but code relying on them (without a really portable API) is almost never going to be forward compatible anyway. (There are still quite a few C extensions not up to CPython 3 yet.)

I was taken back by this rather harsh treatment of Python.

Is it really realistic to 'have it all'? I'm fully aware that I'd have to go to crazier languages if I want parallelism or speed. For what Python is, it offers me reasonable tradeoffs (mostly slanted towards productivity)..

Regarding the FP comments, since it lacks TCO, my take away has always been that Python can only ever become a quasi-functional language. Its hard to be more than that in its current state.

Anyways. These questions made me want to ask you - what languages do you think are better in comparison?

> I was taken back by this rather harsh treatment of Python.

I am taken aback by the evangelical tone of Python enthusiasts, where is has warts intentionally maintained by the creator in the form of missing features.

If you want speed you go to any other scripting language (other than Ruby). I agree Python is mostly sane and naiively productive. That being said, it's a result of the syntax. Transpiling it to another language like Google did, shows that the underlying technology is not worth much.

> what languages do you think are better in comparison

Better in what way? PHP, Go, Pony, Javascript all have these features and the problems with the languages are not that people don't understand when they come across a switch or map.

> If you want speed you go to any other scripting language (other than Ruby).

Ruby has historically had the same issues. Most Pythonistas I know aren't so evangelical. It's mostly a question of how to go about integrating C/C++ code.

Many people complaining about the GIL (and the like) have some naive microbenchmark, don't understand the trade-offs/limitations of their runtime etc. That doesn't mean critique isn't important and required, but it's going to be better when it's properly researched and improves on the body of work out there (https://www.youtube.com/watch?v=Obt-vMVdM8s).

> Transpiling it to another language like Google did, shows that the underlying technology is not worth much.

How is this any general indicator of the worth of the language?

It shows for some cases, that Google thought this was a worthwhile investment. Google has experimented for a long time with ways to improve how Python code can be run. They ran the Unladen Swallow project, but spent more time on LLVM issues at the time making it infeasible to continue the project.

They'll discontinue one path and try another. None of this is really a commentary from Google on CPython, the community, or the value that it has for most people. The people working on this stuff interact in a pretty friendly basis.

> If you want speed you go to any other scripting language (other than Ruby)

Which one? PHP? Perl? Bash? Scheme? VBScript? Windows PowerShell?

Python is in fact of the fastest scripting languages that exist, especially JIT'ed.

The notable exception is JS, and oh, that has a GIL too :P

The lack of tail-call optimization to make the CPython interpreter simpler and debugging easier by preserving the call stack. It was a choice, not an oversight.
From a debugging viewpoint, this does not make sense, there is usually no interesting information in the in between frames.

TCE can also make debugging easier, how useful is a stack trace of 1000 lines consisting of

  ...
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
...

Not so much I think.

I like how you picked an example that is explicitly not TCO-able.

In any case, this particular problem is no longer an issue as of Python 3.6, as that now collapses repeated stacktrace lines (see https://bugs.python.org/issue26823). Although this doesn't work for mutual tail calls, it does solve the debug noise issue in the most common case.

Ah yes, that is a bit stupid, I just wanted an example of a traceback :)
The notion that not having proper tail calls aids debugging always seemed like a post-hoc justification. The stack trace of an iterative function will lack exactly the same intermediate evaluation frames as a tail-recursive implementation.
The thing is, tail calls aren't _just_ about emulating iteration via recursion:

  def foo():
      raise ValueError

  def bar():
      return foo()

  bar()
With TCO, the stack trace would contain `main` and `foo`, as `bar`'s frame would be overwritten by `foo`. This example is simple, but `bar` could be a 50 line long if-else chain of tail calls and when debugging you won't necessarily know which condition was evaluated.
> The thing is, tail calls aren't _just_ about emulating iteration via recursion:

I completely agree, but there is also no need to perform TCO to make code like this safely runnable. TCO only becomes necessary/useful when implementing an iterative process where we can't statically know that the call stack won't be exhausted. That said, TCO is usually an all or nothing transformation, and it would be difficult to accurately avoid eliminating trivial tail calls like in your example.

A reasonable compromise might be for the Python VM to implement a TAIL_CALL bytecode op and require the programmer to decorate functions which rely on TCO. This wouldn't be any more onerous than manually trampolining large portions of code, which is the current method of getting around the lack of TCO.

Why not just make it a dev/production flag, then?
Probably because many tail-recursive functions _rely_ on tail-call elimination working reliably. Without also having an unbounded call stack, disabling tail-call elimination will likely just cause your programs to crash.
I never considered it harsh. If anything, it should be a testament to how nice Python is -- if I /didn't/ like it, I would have a much, much longer list of complaints!

People seem to be missing that sentiment -- I do love Python and use it almost daily. This is merely a list of thorns I run into frequently.

>Regarding the FP comments, since it lacks TCO, my take away has always been that Python can only ever become a quasi-functional language. Its hard to be more than that in its current state.

I mean, it could always encourage playing with functions more -- and importantly, providing an stdlib that encourages that.

>These questions made me want to ask you - what languages do you think are better in comparison?

That is a somewhat loaded question: my counter question would be, "In what regards?"

I cannot say a certain language is better than Python in every or most circumstances, but I can in regards to specific points/features, if you'd like to elaborate.

But it was harsh. You started the post with

> These are obvious flaws in design, in my opinion, that warrant re-looking at, but to which no real improvements are being made for some reason. (Incompetence? Politics? Both? Who knows.)

You list a number of things which Python ecosystem should do better or differently, and suggest that the reason why these changes aren't getting built as fast as you'd like is people playing politics or incompetence.

Whereas in reality, the two main reasons are that the changes will take lots of effort and time, which the volunteers don't have next to their day jobs (PyPy), or that the developers have different opinions on the ideal language design, and just don't agree with you (heavy functional programming).

"It is my personal subjective opinion that you are incompetent" isn't less harsh than "You are incompetent".

Once, I was on mailing lists with GVR and other language contributors, and have seem him go off deeply into functional programming. Some of the stuff he wrote went right over my head.

For someone who declares he hates functional programming even at the most basic levels of data stream manipulation, he knows it quite well.

I've always been frustrated with this disconnect, even more acutely than you have, because I know GVR is being disingenuous when he says, "I don't get it." He absolutely does. He thinks other people won't.

If you claim it's "incompetence" (it's not) then publish the patches that don't break anything but significantly speed up it, for example. Because you complain "it' slow."

It's not what you think it is. It's Python, not a toy language used by nobody. Just first try yourself to "fix" Python and keep its existing users happy by not breaking anything for them, then write about it.

Cool, I think you completely dodged the point. I never said it was slow because they were incompetent.
Python has made some trade-offs that you dislike. You complain about the negative consequences without comparing those against the benefits.

One of the major factors in speed is efficient memory layout. Contrast a Python list with a NumPy array. To achieve speedier loops and vectorized arithmetic [0], the array gives up dynamic typing and dynamic sizing. In most applications, I would gladly give up some compute speed to gain some programming productivity.

I love duck-typing. Formal typing has some impressive examples, but in the projects I've worked on has reduced my productivity. Perhaps because data are so often serialized to simple formats or written to databases that discard the best tools of formal typing. Anecdotal evidence, for sure.

I've never been bothered by the GIL, but I have benefitted from it [1].

[0] https://docs.continuum.io/mkl-optimizations/

[1] https://www.youtube.com/watch?v=P3AyI_u66Bw

>Contrast a Python list with a NumPy array. To achieve speedier loops and vectorized arithmetic [0], the array gives up dynamic typing and dynamic sizing. In most applications, I would gladly give up some compute speed to gain some programming productivity.

Except numpy arrays have a much richer interface and can still store dynamic objects (dtype=object). So what's your point?

>I love duck-typing

So do I. Where does this come from? I don't believe I ever considered it a contra.

Have you ever tried appending to a NumPy array in a loop? It's a total disaster! And dtype=object arrays are mostly useless; they gain almost none of the benefits of regular NumPy (you may as well run np functions on plain lists) and play poorly with other types. NumPy is great for numerics and structured data - lists are general purpose structures for data manipulation. They are different, have different goals and trade offs, and I don't think it's appropriate to claim that one size should fit all.
If you're storing generic objects rather than numbers in a NumPy array, you're discarding its main benefits. Sure, you've got some extra slicing sugar for selecting columns and subsets, but comprehensions are more readable (and faster!) in many of those situations.

Duck-typing is Python's form of dynamic typing and therefore results in the speed penalty. If you want the extra speed, you'll need to give up some dynamicism. I say this now, but some of the work the core devs are doing to optimize dicts might let us have our cake and eat it too. Until then, it's a choice: flexible or fast, not both.

One of the major factors in speed is efficient memory layout.

In most applications, I would gladly give up some compute speed to gain some programming productivity.

By Smalltalk standards, Python is pretty profligate. (By 90's C programmer standards, Smalltalk is pretty profligate.) However, Smalltalk still has many of the high productivity features as Python. (In fact, the debugging story is far superior.) I suspect, though, that Python is still a far superior environment for the things you use it for.

I enjoyed coding homework assignments in Smalltalk, but for some reason I never tried using it professionally.
Your post quite strongly alludes to it being either due to incompetence, or politics, or both. So I think grandparent has a very valid point, and you might want to change the tone of your post a bit; then it'll produce fewer knee-jerk reactions, and might be taken more seriously.
Nah, just people connecting that sentiment with other statements. It should be cleared up since it's causing some mass confusion.

It's funny because I preface it by saying "Remember that it's a matter of opinion" (and, well, the title alone) and people come out of the woodwork completely disregarding this, or outright misinterpreting sections of it.

I maintain that a large reader base here does not actually... read.

Stating that something is a matter of opinion does not mean that insults do not hurt. Either you don't want people to listen to you so you don't have to worry about what you say or you do want people to listen in which case how you phrase things matters.

Or put another way: I'm a core developer of Python and I found your post somewhat insulting (I've unfortunately seen worse). You claim I'm possibly incompetent and I did a half-assed job with asyncio. You very "audibly" sigh and call my work "nonsense". You ask me to "come on" and accept your view on things when I have apparently helped make a "gimped language". And you end by saying I need to "fix [my] language". None of that phrasing comes off as understanding of the hard work and immeasurable number of hours I have put into making sure Python continues to function well for you over the past 14 years that I have been a core developer. I know you like Python as you stated in the post and in the comments here, but that doesn't wash away the rest of the unnecessary negativity in your post such that I want to take your opinions seriously enough to spend the time to explain why things are the way they are.

Fair enough, I never intended it to be posted to HN or receive this much attention or I'd have taken much more care with the tone.

I retract the "ignorance" sentiment, as I do not actually know what Python developers are considering.

I do apologize for that, and thank you for your contributions. It is still by far one of my favorite languages, and I use it daily. :)

Hey, Brett. Thanks for your work :-)
> I maintain that a large reader base here does not actually... read.

I would disagree.

Although there's often a fair number of commenters who clearly read the title of an article, and then just start commenting on that, in this case we can see that people have read (at a minimum) your opening statement and whichever list item they're taking issue with.

I believe that if a large set of people are misinterpreting what I've written, it's a sign that I probably wrote it poorly. Not in the sense of arguing for the wrong thing, but in the sense that I'm not conveying my argument well enough. This is easy to do, because when I'm writing something, I know what I mean. This sounds obvious, but it's hard to read what I've written tabula rasa, without bringing that "of course I mean X" view to it.

So, the feedback you're getting here is that your opening statement colors the rest of your piece. That "Incompetence? Politics? Both?" aside obviously makes a large subset of readers assume that you're saying "all of these complaints in my list must be unaddressed due to incompetence or politics". That's at odds with the later "just an opinion" statement, and people are sticking with the more-inflammatory initial claim.

I'm pretty sure everybody here can read. We are asserting that your sentence:

"These are obvious flaws in design, in my opinion, that warrant re-looking at, but to which no real improvements are being made for some reason. (Incompetence? Politics? Both? Who knows.)"

sets a very negative tone and makes your "list of problems" look like a "list of complaints that these incompetent losers should fix asap". In my opinion you may just not be the best writer for some reason (Lack of education? Incompetence? Both? Who knows).

>In my opinion you may just not be the best writer for some reason (Lack of education? Incompetence? Both? Who knows).

Then right back at you -- that triggers the same problems as the original statement. :D

You could very well have stated your point more constructively with that.

Makes me think of something I read about cultural divides. Some cultures think that a speaker can say whatever they feel like, and it's the listener's obligation to figure out how to understand it. Others think that it's the speaker's obligation to structure and phrase things in a way that make it clear to the listener what they meant.

Not looking to make value judgements of whether one is generally better, but I think it's clear that when writing on the internet for general audiences, the second way is more effective in spreading your point.

I've read your piece, and unfortunately the overall tone sounds like a rant. I'm sure it wasn't your intent, but tone is hard to convey in a purely textual medium sometimes. I fall victim to this often, and have been actively working to try to avoid excessively negative tone (even if I feel that way).

The ranty tone of the piece obscures the rest of the points you were trying to make - many are good, but a strong tone will immediately put people on the defensive rather than trying to open up and understand what's being said.

Yes, I'm seeing now that the tone of the post is more talked about than the actual contents. Learning!
The thing is, by saying that your incompetence/politics remark is just an opinion, and by not going out of your way to offer a factual justification, you have made it clear that it is essentially an information-free statement, while its snide disparagement of people who have put a lot of effort into Python lingers undiminished, if not actually emphasized as such by its lack of information. Then you double-down by taking on all the people who see that this is so. As a consequence, the currently-top issue in the comments is this, and not whatever it is you think someone should be doing to improve Python.
> I never said it was slow because they were incompetent

Your own article appears to have exactly that claim:

> These are obvious flaws in design, in my opinion, that warrant re-looking at, but to which no real improvements are being made for some reason. (Incompetence? Politics? Both? Who knows.)

> Without further ado: The standard interpreter bring rather slow; PyPy is nice, but its Python 3 support is very immature.

Oh don't be silly, he's applying multiple possible reasons to the set of frustrations. You're applying all of them to a single frustration. What you've done isn't logical.
When he claimed that the properties P or Q are of each of the set of A, B, C, D, E, you claim that I can't conclude that he said that P or Q are the properties of A?

I used A has property P, substitute A has property Q, it's still wrong claim of him, for the very same reasons:

Python has a huge user base, if he can improve it in any of his points (I just selected one) and keep it working for the user base, please. I know he won't be able, his talk is of ignorance and provocation.

And these who develop aren't stupid or doing politics, they keep the Python working for their base, improving it as much as they can.

See https://news.ycombinator.com/item?id=13485784

>I know he won't be able, his talk is of ignorance and provocation.

Says the person missing that it's an opinionated list, lol. It was more meant to provoke discussion, not hurt feelings (like your obviously seem to be).

The standard library in general encourages use of higher-order functions and concepts borrowed primarily from FPLs (see: comprehensions, map/reduce, sort, etc.) I could not imagine seeing them backtracking on this -- it only helps them to go further in that direction.

I don't think the standard library really 'encourages' this in a way that's different from most languages that support first class (rather than 'higher order') functions. If you consider python's evolution, its support for many common programming paradigms was somewhat haphazard and weak and developed over time, mostly pragmatically. OO has become stronger, the 80% use case of common functional idioms is covered by comprehensions, etc. Ill thought out features (e.g. terrible lambdas) have become de-emphasised. The choices are extensively document, even if not everyone cup of tea so it seems both glib and inaccurate to say (re: FP) 'it only helps them to go further in that direction'. How does it help them?