Why would it decrease single thread performance? How is python different than other languages that support native full-fledged multi-threading, eg Java, Go, C#?
A big part is that Python uses reference counting GC. Java, Go, C# all use tracing GC. Py_INCREF and Py_DECREF are responsible for inc/decreasing the reference count, and are not atomic. The GIL ensures refcount safety by allowing only one thread access to changing refcount. The naive approach to parallelization would require locking each ref inc/dec. There are some more sophisticated approaches (thanks to work by Sam Gross et al) that avoid a mutex hit for every inc/dec.
Tracing GC does not run into this problem. Why Python doesn't use tracing GC is not something I am qualified to answer.
I am by no means knowledgeable enough on the topic, but Swift has similar problem domain, and afaik only uses atomic ref counts for objects that “escape” from a given thread - is there a reason something like that wouldn’t work for python as well?
python made it's C api visible, so things like reference counting are widely observed by C libraries that interop with python. This makes it much harder to make changes since you can't change the implementation in ways that programs rely on.
Tracing GC does not run into this problem. Why Python doesn't use tracing GC is not something I am qualified to answer.
Sam Gross' work: https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsD...
The GIL code: https://github.com/python/cpython/blob/main/Python/ceval_gil...
Py_INCREF: https://github.com/python/cpython/blob/a4460f2eb8b9db46a9bce...