Hacker News new | ask | show | jobs
by JavaOnlyGuy 1403 days ago
How are pointers implemented in a language that doesn't support them?
4 comments

I don't think this is how it works.

The JVM is a specification which describes a pretend computer and its instruction set.

This TruffleC doesn't translate C to Java and run a Java program. This compiles C to bytecode which operates on the JVM.

Whatever Java does or doesn't support is irrelevant to this compiler. TruffleC has nothing to do with the Java programming language at all.

Just like you can compile C and get a memory address of a stack or heap location on any physical computer supported by a C compiler, likewise you can compile C with TruffleC and get a memory address within the stack or heap of the pretend computer called the JVM.

This must be how it works, unless the JVM itself has no concept of memory addresses, which seems very unlikely to me. Let me know if I am wrong?

> This compiles C to bytecode which operates on the JVM.

No, it compiles C to an AST, which it then interprets. The AST, which is also the interpreter in the Truffle design, are then partially evaluated to produce machine code. No bytecode is generated at any point, and in fact you can run it on a JVM that doesn't use byteocde, and then there is no bytecode anywhere.

I learned most of what I know about Truffle and Graal from your blog posts, so you obviously know more about this than me. However, I was under the impression that Truffle is quite closely integrated into GraalVM, that is, you can't use Truffle on a different JVM. Is that not true?
Not so. Truffle is just a Java library like any other. You can therefore run Truffle languages on any JVM. However, they will run slow as they are just interpreters, then. To get the speedups you need to use Graal, which recognizes Truffle as a library and treats it specially.
Well, OK, sure, but Truffle without partial evaluation is just an interpreter written in a very particular way...

I see what you mean though, thanks!

> Well, OK, sure, but Truffle without partial evaluation is just an interpreter written in a very particular way..

That's what it was to start with. Partial evaluation came later.

Truffle and partial evaluation also works on native-image. You could say this is a VM where there are no bytecodes anymore.
Oh, of course, but native-image is still a Graal feature, and I was asking about Truffle without Graal.
native-image was created as part of the Graal project but I think it's a separate JVM implementation from GraalVM
the JVM bytecode does not have any memory address type. Just various width integers & floats, and references to managed heap objects. Arbitrary pointers would have to be done with 'long's one way or another.
You can still use pointers. It's a bit hidden, but there are things like `Unsafe.allocateMemory`, `Unsafe.getByte` and so on ;)
right; at which point the subset of jvm you're using is a subset of any other IR/VM, the 'j' in 'jvm' being only useful as an implementation/runtime.
Sure, but don't discount all of the JIT optimizations that were implemented in the JVM and the huge number of engineer years invested in that particular implementation/runtime...
I guess a really brute force way would be to have a huge dictionary mapping from "memory address" (really just an arbitrary number) to JVM object. malloc() would add to the dictionary and free() would remove an entry. Pointer dereference would look up in it but would need to be able to find the nearest lower entry (for when you have an array and dereference an entry in it, or use a pointer to a field in a struct).

I would hope that there's a much more efficient way to do it, this idea is just evidence that it could be done in principle. But I don't see what that more efficient way would be. You certainly need to keep a secret reference to each JVM object somehow because C doesn't require you to keep any pointer to an object e.g.

    intptr_t x = (intptr_t)malloc(sizeof(int));
    *(int*)x = 99;
    bool did_subtract_50 = false;
    if (x > 50) {
        did_subtract_50 = true;
        x -= 50;
    }
    // Now there is no pointer or even integer that contains the address
    
    // ... later ...
    // Retrieve the address and use and free it
    int* y = (int*)(x + 50 * did_subtract_50);
    printf("value: %d\n", *y);
    free(y);
A class wrapping a long value with the pointer address in it.
Ha! Your paper is a "highly influential citation"

https://www.semanticscholar.org/paper/TruffleC%3A-dynamic-ex...

The side-bar says 'highly influential' but the badge lower down says 'highly influenced' which sounds like a bad thing doesn't it?
Probably meant as "[this paper has] highly influenced [citing paper]".
Semantic Scholar is calling out when it thinks the researchers were using drugs.
How is the C memory modelled? One big Java array, or are there multiple data-structures?

For instance, what happens when you call a function-pointer?

> How is the C memory modelled?

Using a combination of native memory and JVM managed memory, depending on what the memory is needed for.

> For instance, what happens when you call a function-pointer?

This is a good example - because TruffleC can inline-cache a function-pointer, inlining the called function!

All this is in the linked paper, of course.

It can be done in a few different ways. Native memory can be managed as plain native memory (under the hood you can use Unsafe to access that memory) but the real advantage is that pointers to many objects can be kept as managed pointers and not converted to a native value most of the time. For example Ruby C extensions often use VALUEs to refer to Ruby objects which are normally tagged pointers. In TruffleRuby we use ValueWrapper objects to represent these, and maintain a fast map between native values and these objects when necessary.
Well-behaved usages of pointers according to the C standard can be implemented by whatever means fit best. Fat pointers with metadata about the destination and a huge block of memory for generic cases come to mind. The rest is undefined behavior where the runtime can just nuke the program, aka segfaulting.