Hacker News new | ask | show | jobs
by mjr00 411 days ago
> The way these type checkers get fast is usually by not supporting the crazy rich reality of realworld python code.

Or in this case, writing it in Rust...

mypy is written in Python. People have forgotten that Python is really, really slow for CPU-intensive operations. Python's performance may not matter when you're writing web service code and the bottlenecks are database I/O and network calls, but for a tool that's loading up files, parsing into an AST, etc, it's no surprise that Rust/C/even Go would be an order of magnitude or two faster than Python.

uv and ruff have been fantastic for me. ty is definitely not production ready (I see several bizarre issues on a test codebase, such as claiming `datetime.UTC` doesn't exist) but I trust that Astral will match the "crazy reality" of real Python (which I agree, is very crazy).

5 comments

> such as claiming `datetime.UTC` doesn't exist)

This is a known issue — we're currently defaulting to a conservative Python version, and `datetime.UTC` really doesn't exist until Python 3.11!

https://docs.python.org/3/library/datetime.html#datetime.UTC

We will probably change the default to "most recent supported Python version", but as mentioned elsewhere, this is very early and we're still working out these kinds of kinks!

You should be doing this dynamically based on the version of python you are running against, so that you don't have to hardcode or make such "conservative" choices by hand.
Note that we're not ever spinning up a Python interpreter to run your code, or monitoring an existing running Python process. So we do need some kind of metadata.

But yes, if you have a Python version specified in pyproject.toml, we respect that, and if you have a virtualenv, we can see the Python version that was used to create that. And that's what we use to type-check your code.

The default being discussed here is what we fall back on if that project metadata isn't available.

I think they probably know that, this is alpha software, no need to be condescending.
Criticism isn’t necessarily condescending. “You should be doing x because y” is just a plain assertion, it doesn’t imply any moral judgement or opinion of the author.
They said they will default to some newer version, which indicates they are not planning to do this dynamically.
How is it condescending in any way? I found it to be a constructive criticism; i.e. useful help.
I don't necessarily read it as condescending, but I do read it as presumptuous. What someone "should" do depends on many things. Maybe, because this is software in alpha stage, they should _not_ focus on this part of the code if it is minor compared to other obligations. Or maybe there are other reasons they've chosen not to do this (as was explained in an above comment).

IMO, a less presumptuous criticism would be phrased like "if you did X then benefits Y would happen", or "if you haven't, consider X", or even (the least presumptuous - make it a conversation!) "have you considered X?", rather than "you should do X".

I see what you mean. Perhaps it was just a "poor" choice of words for whatever reasons. I am sure we can assume he intended it in a way of "have you considered X?".
(ty developer here)

Currently we default to our oldest supported Python version, in which `datetime.UTC` really doesn't exist! Use `--python-version 3.12` on the CLI, or add a `ty.toml` with e.g.

``` [environment] python-version = "3.12" ```

And we'll find `datetime.UTC`.

We've discussed that this is probably the wrong default, and plan to change it.

I realize this might be hard from a technical / architecture standpoint, but it would be great if "does not exist" and "does not exist in this version of Python" were two different errors.

If I saw something like "datetime.UTC doesn't exist", I'd immediately think "wait, was that datetime.utc", not "ooh it got added in 3.11, I need to change my Python version"

I agree that would be nice; probably not near the top of our list right now (and not trivial to implement), but it makes sense. Thanks for the suggestion.
Nontrivial way to do it is dynamically scan the python 3.12 namespace, and add these warnings.

Is there any big downside to do it the boring way, hardcode a list and compare the error to the list?

This information is already maintained via `if sys.version_info >= (...):` conditionals in typeshed stubs. I don't think this is important enough to justify maintaining the same information in a duplicate way.
aha makes sense! Yeah it'd be nice if you could divine the intended python version from the uv configuration/`.python-version`. Thanks for all your hard work, looking forward to the full release!
Defaulting is wrong: what is checked is the aggregate of actual user code, standard library for a given Python version and installed packages. It has to be the same environment as when the program is run, leaving conservative approximations (checking types with the oldest supported library versions and hoping newer ones are OK) to the user.
Yes, if you have a Python version specifed in pyproject.toml, for instance, we respect that, and that's what we use to type-check your code. The default being discussed here is what we fall back on if that project metadata isn't available.
Could you check what version of `python` is in the PATH and use that as the default?
Python is slow for some CPU-intensive operations.

There are some extremely CPU-intensive low-level operations that you can easily write in C and expose as a Python API, like what Numpy and Pandas do. You can then write really efficient algorithms in pure Python. As long as those low-level operations are fast, those Python-only algorithms will also be fast.

I don't think this is necessarily "cheating" or "just calling disguised C functions." As an example, you can write an efficient linear regression algorithm with Numpy, even though there's nothing in Numpy that supports linear regression specifically, it's just one of the ways a Python programmer can arrange Numpy's low-level primitives. If you invent some new numerical algorithm to solve some esoteric problem in chemistry, you may be able to implement it efficiently in Python too, even if you're literally the first person ever writing it in any language.

The actual problem is that it's hard for people to get an intuition of which Python operations can be made fast and which can't, AST and file manipulation are sadly in the latter group.

That is a confusing way to look at it. Python is slow, C is fast. If your python code is calling functions that were not written in Python (even if it is indirectly thru a library you are using), that is not "pure python".
That works in numerical libraries because you can encapsulate the loops into basic operations that you then lower to C. In a domain like type checking it's not nearly as easy/doable.
> As long as those low-level operations are fast, those Python-only algorithms will also be fast.

Only if you spend more time on the C implementations than on Python. If you have pure Python loops, you'll be slow. You need quite high-level components and minimal Python glue for it to be fast.

CPU intensive is not quite the right metric. What python is slow at is all the extra administration that comes with basic stuff like accessing attributes and function calls.

This gives somewhat counterintuitive results where declaring and summing a whole list of integers in memory can be faster than a simple for loop with an iterator.

But yeah writing stuff in a different (compiled) language is often better if that means the python interpreter doesn't need to go through as many steps.

mypy is compiled using mypyc. It does not run as Python code.
The semantics of Python makes it problematic to run at speed, it is not just about interpreted vs compiled code. Give the high levels of dynamic behaviors that are allowed, a Jit (like pypy) has a higher chance of getting decent performance if the code has an underlying behavior that can be extracted.
mypy is also written in a style conducive to speed ups when compiling with mypyc