Hacker News new | ask | show | jobs
by notzorbo2 3864 days ago
This happens pretty often in the Python world. There's a bit of an unwritten rule to leave implementation details public that would be private in other languages. Many libraries simply don't bother with prefixing privates with '_' and just leave things undocumented that you probably shouldn't touch/use.

One notable example is importing libraries in your code automatically exposes them to the caller.

    $ cat lib.py
    import re
    
    def somefunc():
        pass

    $ python
    >>> import lib
    >>> lib.re
    <module 're' from '/usr/lib/python2.7/re.pyc'>
Many packages also do `import *` from files which polutes the package namespace with all kinds of stuff you really don't want in there. For example, the popular Requests package:

    >>> import requests
    >>> requests.logging
    <module 'logging' from '/usr/lib/python2.7/logging/__init__.pyc'>
The logging module is not a public part of requests' API. It's just there because requests uses it internally.

So to answer your question, I'd say it's just common practice. If it's undocumented in Python, you should pretend it doesn't exist.

2 comments

Importing libraries from other libraries like this is very useful:

    $ python
    >>> import lib
    >>> lib.re
That's how __init__.py files work, and is part of what makes Python awesome. Using `import *` is very bad practice in modules (except in very specific cases) because it brings in a bunch of crap you don't want and didn't expect. Modules should define a '__all__' list of 'public things' you want to export, but restricting access is very anti-python as we're all consenting adults.

   If it's undocumented in Python, you should pretend it doesn't exist.
	
I don't agree. It's fairly common to dive into 3rd party packages code to see what's occurring and to use 'undocumented' things (which is mostly because the documentation is bad rather than being hidden away). Just look at the Django `_meta` API, which people relied on because it was the only place you could get some specific model information in a stable way, despite being undocumented and private. Now it's been formalized into a proper API.

Pythons extensive use of duck typing also makes it a lot easier to work with undocumented stuff, you can make some wide ranging changes to internals (changing types completely, turning properties into functions) but as long as it quacks roughly the same nothing breaks.

What is useful about 'import lib; lib.re'? I cannot think of a situation in which a direct important wouldn't be better, while that has many obvious advantages. The __init__.py is a special case of course, but you would only use that for a package's own modules.

> It's fairly common to dive into 3rd party packages code to see what's occurring and to use 'undocumented' things

It may be common, but it doesn't convince me that it's a good idea. It seems to me that it would be better if the language forces you to design the public API properly, than to resort to using undocumented/private APIs.

> What is useful about 'import lib; lib.re'?

Nothing, but it's a side effect of an awesome feature of Python: nothing being private. Which is incredibly useful. 'lib.re' is exactly the same case as 'lib.actual_library_function', why should Python add the ability to somehow stop these from being included? It would increase complexity for no gain.

You're just repeating that it is awesome and useful without saying why. I think distinguishing public & private variables offers more support for structured programming and is therefore desirable.
> You're just repeating that it is awesome and useful without saying why.

Sorry, I thought you were asking why you are able to import other modules imports.

> I think distinguishing public & private variables offers more support for structured programming and is therefore desirable.

You can prefix attributes and functions with a single underscore to mark them as private, or a double underscore to make them more private (the attribute name gets mangled).

Anyway, Python doesn't have a enforced notion of privateness because it's a bad idea. By marking something as private you're saying "I, as a developer sat here writing this know better than all of the users of my library. Their lives may depend on using something I haven't exposed properly in my API, but too bad. I know best".

So you end up jumping through ridiculous hoops to access private properties (because even in languages with private, nothing is truly private), all because some guy thought he knows best a long time ago while writing the library you are using.

So a better approach (IMO) is to mark something as private with convention (a prefixed underscore), which means "this is private, don't depend on it", without restricting your access. You can drive a car, have sex, pay taxes, but not access a private variable? Bleugh.

That's more of a cultural thing though, I'm sure enforced private makes more sense in statically typed, compiled languages with lots of classes (and even then I would argue they are still bad for the reasons above), and matter more in huge codebases.

How can the language force good design?

Like how would a language that uses explicit exports stop someone exporting everything?

I agree that accessing libraries indirectly probably isn't useful, but I think being able to do dir(lib) and see the namespace that is in use is a good thing (at least in the context of Python).

The language cannot force it but can encourage it. Having to make the choice between private and public is worthwhile I think.

> being able to do dir(lib) and see the namespace that is in use is a good thing.

It is, but it is most useful when what you get a curated list of members (__all__) intended to be public.

unwritten rule

I don't know if I'd call this (common) practice an unwritten rule in the Python world.

rather, there's no real notion of visibility.. so the only thing you can do to make something 'look private' is name it obscurely (i.e. the underscore prefix, as you mentioned) and leave it undocumented.

it's perhaps my least favorite part of python module semantics.