The JVM is a specification which describes a pretend computer and its instruction set.
This TruffleC doesn't translate C to Java and run a Java program. This compiles C to bytecode which operates on the JVM.
Whatever Java does or doesn't support is irrelevant to this compiler. TruffleC has nothing to do with the Java programming language at all.
Just like you can compile C and get a memory address of a stack or heap location on any physical computer supported by a C compiler, likewise you can compile C with TruffleC and get a memory address within the stack or heap of the pretend computer called the JVM.
This must be how it works, unless the JVM itself has no concept of memory addresses, which seems very unlikely to me. Let me know if I am wrong?
> This compiles C to bytecode which operates on the JVM.
No, it compiles C to an AST, which it then interprets. The AST, which is also the interpreter in the Truffle design, are then partially evaluated to produce machine code. No bytecode is generated at any point, and in fact you can run it on a JVM that doesn't use byteocde, and then there is no bytecode anywhere.
I learned most of what I know about Truffle and Graal from your blog posts, so you obviously know more about this than me. However, I was under the impression that Truffle is quite closely integrated into GraalVM, that is, you can't use Truffle on a different JVM. Is that not true?
Not so. Truffle is just a Java library like any other. You can therefore run Truffle languages on any JVM. However, they will run slow as they are just interpreters, then. To get the speedups you need to use Graal, which recognizes Truffle as a library and treats it specially.
the JVM bytecode does not have any memory address type. Just various width integers & floats, and references to managed heap objects. Arbitrary pointers would have to be done with 'long's one way or another.
Sure, but don't discount all of the JIT optimizations that were implemented in the JVM and the huge number of engineer years invested in that particular implementation/runtime...
I guess a really brute force way would be to have a huge dictionary mapping from "memory address" (really just an arbitrary number) to JVM object. malloc() would add to the dictionary and free() would remove an entry. Pointer dereference would look up in it but would need to be able to find the nearest lower entry (for when you have an array and dereference an entry in it, or use a pointer to a field in a struct).
I would hope that there's a much more efficient way to do it, this idea is just evidence that it could be done in principle. But I don't see what that more efficient way would be. You certainly need to keep a secret reference to each JVM object somehow because C doesn't require you to keep any pointer to an object e.g.
intptr_t x = (intptr_t)malloc(sizeof(int));
*(int*)x = 99;
bool did_subtract_50 = false;
if (x > 50) {
did_subtract_50 = true;
x -= 50;
}
// Now there is no pointer or even integer that contains the address
// ... later ...
// Retrieve the address and use and free it
int* y = (int*)(x + 50 * did_subtract_50);
printf("value: %d\n", *y);
free(y);
It can be done in a few different ways. Native memory can be managed as plain native memory (under the hood you can use Unsafe to access that memory) but the real advantage is that pointers to many objects can be kept as managed pointers and not converted to a native value most of the time. For example Ruby C extensions often use VALUEs to refer to Ruby objects which are normally tagged pointers. In TruffleRuby we use ValueWrapper objects to represent these, and maintain a fast map between native values and these objects when necessary.
Well-behaved usages of pointers according to the C standard can be implemented by whatever means fit best. Fat pointers with metadata about the destination and a huge block of memory for generic cases come to mind. The rest is undefined behavior where the runtime can just nuke the program, aka segfaulting.
The JVM is a specification which describes a pretend computer and its instruction set.
This TruffleC doesn't translate C to Java and run a Java program. This compiles C to bytecode which operates on the JVM.
Whatever Java does or doesn't support is irrelevant to this compiler. TruffleC has nothing to do with the Java programming language at all.
Just like you can compile C and get a memory address of a stack or heap location on any physical computer supported by a C compiler, likewise you can compile C with TruffleC and get a memory address within the stack or heap of the pretend computer called the JVM.
This must be how it works, unless the JVM itself has no concept of memory addresses, which seems very unlikely to me. Let me know if I am wrong?