Hacker News new | ask | show | jobs
by jerf 3137 days ago
"Can someone explain why is "having a runtime" problematic for writing extensions and calling them from Python ?"

Perhaps instead of saying "having a runtime" it would be better to examine the situation in terms of what the code assumes. Python assumes that it has the Python GC running on its code, that everything is a PyObject of one sort or another, that it has a Global Interpreter Lock that if taken will prevent anything from modifying anything it thinks it owns, and so on. Go assumes that it has the Go GC running (despite both "having GC", there's enough differences that it must be specified as a difference), that its objects are laid out in certain manners such that most field references are compiled down to static offsets rather than dynamic lookups, that it can run its core event loop and dispatch out work to its internal goroutines without asking anyone else, etc.

You could go on for quite a while; I don't intend those as complete lists. I just want to convey the flavor of conceptualizing the runtime in terms of assumptions that the code running in that runtime can make.

Once you look at it this way, it should be more clear why trying to jam two runtimes into one OS process gets to be tricky. I use the word "jam" quite carefully, because it always feels that way to me. The more differences between the assumptions of the two runtimes, the more translation the code is going to need. For instance, Python to anything else is going to involve unwrapping the data from the internal PyObject wrappers, and wrapping anything coming back from somewhere else back into PyObjects. Threading models have to be matched up. Memory layout has to be harmonized. Memory generally has to be kept strictly separated, because the two runtimes both expect to be able to manage memory, so you can't hand memory allocated by one of them to the other, which further implies that you're almost certainly copying everything across the boundary. Etc. etc.

I'd also separate out the way there can be differences in the affordances of the languages. For instance, Python doesn't have what Rust or Go would call "arrays". Rust and Go are fine with getting arrays of pointers, but the languages afford the use of memory-contiguous arrays without pointers, so especially if you're integrating with a third-party library, you have no choice but for some layer somewhere along the way to convert Python lists into the correct sort of array. The runtimes technically don't force this, but the structure of the libraries and code afforded by the other languages do. By contrast, if you were integrating with lisp, you might find many points where you need to turn things into singly-linked lists, again, not because Lisp can't handle arrays, but because you're likely to encounter pre-existing Lisp code that expects Lisp cons lists.

As another example, despite the fact Go and C generally see eye-to-eye on how to layout structs, the C support from Go is still extremely expensive due to the need to convert from how Go sees the concurrency world to how C sees the world. C, contrary to popular belief, actually does have a runtime, and that runtime tends to assume it has very deep control of the OS process it is running in. Go has to do a lot of work to isolate the running C code in an environment it is comfortable with, where it won't be pre-empted by the green thread code (on account of the fact that it can't be, C doesn't support that). There's also some tricksy code you may need to write to harmonize C's memory-management-via-malloc model with Go's "lifetimes determined via the GC" model. (If you listen carefully, you can hear the Go runtime go "klunk" every time it runs cgo code.)

Rust has a runtime too, but unlike a lot of languages, it has the ability to shut it off. You lose some services and capabilities, but on the upside, you significantly reduce the number of assumptions the Rust code is making, making it easier to integrate with other runtimes. (I say reduce because technically, it still doesn't make it to zero if you are precise enough in your thinking, but I'd expect that of all the current "cool" languages, Rust with the runtime off probably makes fewer assumptions than anything else.) That said, I'm not sure if this code is working in that mode. I see the rust code doesn't directly turn off the runtime, but I don't know what that "#[macro_use] extern crate cpython;" line fully expands to. It's possible that the full Rust runtime is still in play, which looks enough like C anyhow (by explicit design of the Rust team) that Python's existing C integration can just be reused. Either way Rust is still making many fewer assumptions that Go's relatively heavyweight (in terms of assumptions moreso than resources) runtime.

2 comments

Rust's runtime is basically the same weight as C, that is, crt: https://github.com/rust-lang/rust/blob/master/src/libstd/rt....

What you're talking about is more of dropping the standard library.

> C, contrary to popular belief, actually does have a runtime

I've been left wondering what you meant by this. Are you referring to the stack and heap management? Or OS processes and threads?

If not, could you please explain what you mean by C runtime, and how does Rust differs from it when it is shut down??

"could you please explain what you mean by C runtime,"

There's two components to the C runtime, what is specified by the C standard, and what is specified by POSIX and the operating systems. I am not sufficiently familiar with the C world to tell you exactly which thing is defined in which part. Fortunately, for this discussion of how integrating C code into another runtime goes, it doesn't really matter.

The C runtime includes the assumption that there is a malloc-compatible memory allocator available (note it's swappable), the process of linking programs when they start up and the whole surrounding "symbols" they can obtain. It has certain assumptions about what state needs to be saved when a function is called; for instance, it won't save the flags on the processor controlling IEEE FPU conformity. Function calls have a "stack" and there's a "heap", and the language itself distinguishes between them. C itself, IIRC, has no specification for threads whatsoever, but the OSes seem to have converged on a fairly similar model that could be fairly called part of the runtime now.

It's hard to "see" the C runtime because it has won so thoroughly that it just looks like "how computation is done", or is so deeply integrated into the operating system that it forces parts of the model on everything that runs on that OS. You kind of have to piece together what C does by looking at what it does that other languages do differently. Yes, most programs at some point will do some linking and symbol resolution, but once the interpreter has started up, dynamic languages have no concept of a static symbol table. Loading another Python module doesn't even remotely resemble loading a C library, either at startup or dynamically later. The language Go doesn't have a stack or a heap. The implementation does for practical reasons, but the language does not. Most other languages now will save the same things on the call stack as C, but that's not a requirement of computation; you could save a lot more of the processor's state, but it'll trash your function performance to do it. A "stack" and "heap" model is not necessary; Haskell for instance does not have a clear "stack" at all. (It does stack-like things, certainly, but it turns out getting what most people call "a stacktrace" from the runtime is actually fairly hard. I believe still not possible on GHC.) There are alternate methods for threading, including models that still use the C-style threads under the hood but include mandatory code to be run at startup and shutdown to be "part" of the runtime.

C is not as thin as it looks; it's just that history has made it appear to be the baseline. And as I know my internets, let me say that nothing in this post is criticism. Something has to be the baseline. While I think the C baseline is getting long in the tooth, it won for a reason, and I don't know that we could have gotten much better from the 1970s. (The other competition usually cited was either a performance non-starter (the Lisp of the time), or had it survived for 40+ years, we'd be able to write a very similar post about how it is getting long in the tooth too in 2017 (Pascal, for instance).)

A great answer!

> I am not sufficiently familiar with the C world to tell you exactly which thing is defined in which part.

I've got some bits of knowledge here. I could be wrong, as it's not my expertise...

> Function calls have a "stack" and there's a "heap", and the language itself distinguishes between them.

I don't believe this is true or at least, not literally but the details are interesting! http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf is what I usually go by when talking about C11. Malloc is defined in 7.22.3.4, and says:

> The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

In 7.22.3, the overview for all the memory functions, it says stuff like

> The lifetime of an allocated object extends from the allocation until the deallocation.

which restricts how you can implement it, of course, but it doesn't use the words "heap" and "stack" at all; "stack" is never mentioned in the document. 6.2.4 talks about storage durations, this is usually what we think about when we talk about "stack" and "heap" and such. "stack allocated" is more properly termed "automatic storage duration" and "heap allocation" is "allocated storage duration."

This is a side effect of the fact that C itself is defined in terms of a virtual machine! They call it the "abstract machine".

Anyway, all of this is in service of your point about history and such. Many people just assume all of this is how it has to be, rather than something that came to be thanks to history. It's all very interesting!

> C itself, IIRC, has no specification for threads whatsoever, IIRC

C11 added this, actually, but before that, you're 100% right.

Thank you for the elaboration. Now that you remind me, I remember about the C11. Which also adds "The C memory model" as part of the runtime, IIRC. Other languages have different memory models. Usually simpler, though it's hard to hold that against C11 since it was in the unenviable position of trying to codify decades of implicit and divergent practice in one of the trickiest places in software engineering.