Hacker News new | ask | show | jobs
by al_james 2944 days ago
What I can't work out from both the article and the comments here: from an application point of view, do I use this like I use memory, or do I use it like I use a disk?

No matter how fast a disk is, using it means either some expensive serialization/deserialization step (and also the associated memory access to create the 'working' object that my logic actually works on) or writing my algorithms to forego in memory objects (and the associated features offered by my programming language, e.g. classes / objects or whatever) and working from the raw byte values.

What I really want, and would be a game changer as to how we use things, would be that my programming languages heap can be made persistent (or at least a part of it). In this case instead of:

  var mything = new Thing();
  load_thing_from_disk(mything);
I might have:

  persistent var mything = new Thing();
Done. However this also introduces more questions, like transactional commits to memory etc (as few apps are coded to ensure consistency of memory across reboots).

However I cant help thinking that some way to harness persistent fast memory without needed some complex disk->logic mapping would be a game changer.

Edited: spelling and wording

3 comments

Disclaimer: I work at Intel on PMDK (pmem.io)

It is the game changer that you wish for, since the marshaling logic that you mention is gone. Persistent Memory can be accessed directly through memory mapped file, bypassing the traditional read()/write() I/O paths. Recent file systems have also been modified to a) skip the page cache layer and b) forgo the msync() call that would be otherwise required to synchronize the modified pages. This is what's called DAX (Direct Access [0]). In the place of msync() you can now just use CPU cache flush instructions. These two file system changes entirely eliminate kernel code from the I/O path (apart from the initial page faults).

Persistent Memory Development Kit contains libpmemobj [1], which is almost exactly what you are imagining ;) It's a persistent heap, with transactions for durability. It's not as nice (yet) as your code snippet, but here's C++ example [2] of a persistent queue push:

  obj::transaction::exec_tx(pool, [this, &value] {
    auto n = obj::make_persistent<Node>(value, nullptr);

    if (head == nullptr) {
      head = tail = n;
    } else {
      tail->next = n;
      tail = n;
    }
  });
`make_persistent` is, akin to `make_unique`, a memory allocation of a "Node" class. Once allocated, we can just assign the newly allocated object to a different persistent variable. No kernel code executing, no serialization ;)

[0]- https://www.kernel.org/doc/Documentation/filesystems/dax.txt

[1] - https://github.com/pmem/pmdk

[2] - https://github.com/pmem/pmdk/blob/master/src/examples/libpme...

If your data contains pointers, then for it to be round-tripped through persistence correctly, i would imagine you'd need to map it at the same virtual memory address every time. Which isn't possible. Have i got that wrong?
That's an excellent observation. You are indeed correct that pointers in memory mapped files are quite tricky to get right. When you think about shared memory in general, this isn't a new problem [0], and the solution is almost exactly the same [1]. Instead of dealing with raw pointers, the library provides an encapsulated fat pointer which contains an offset from the beginning of the mapping. And when the file is opened, we simply register the new virtual address, and calculate the real address when needed.

[0] - https://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/...

[1] - http://pmem.io/pmdk/cpp_obj/master/cpp_html/classpmem_1_1obj...

When you mmap() a file you can specify the virtual address so it will be the same every time.
Yes, but to accomplish that you would have to use the MAP_FIXED flag, which is quite dangerous because it can replace previous mappings. That can lead to problems with dynamic memory allocation since almost all malloc() implementations use anonymous mmap.
Yes but this is a trivial problem to fix on 64 bit machines. There's so much address space the kernel can just be told to never pick certain address ranges for unfixed mmaps, leaving the rest of the address space free for persistent heaps.

The actual hard part of persistent heaps isn't the persistence part. It's transactionality and upgrade management.

It is of course important to point out that this is akin to casting a struct to a void pointer and writing it to a file (it's just faster), which works extremely well but requires the data structures to have a stable memory representation. If one changes the structs in any way, old persistent data will look like garbage. One should therefore still have an extremely well-defined and versioned system for managing persistent data, rather than just arbitrarily allocating objects on the persistent heap.

It's still neat, though.

Yup. In this aspect libpmemobj could be compared to how Cap'N Proto [0] works. And of course, this has some trade-offs that users need to be aware of.

[0] - https://capnproto.org/index.html

It would appear to be treated as a type of object store that you access with a special library, or through the OS as a filesystem: https://software.intel.com/en-us/articles/introduction-to-pr...
Thats very very interesting. Many thanks for sharing.

So its still a serialize/deserialize cycle, but the access libs built on top of the persistent memory look interesting.

I see it as a disk drive where memory mapping is a NOP and its performance close to RAM. So an .exe starting from this disk already has all of its contents mapped to RAM, and you can create and memory-map a 512 GB file on it and use it essentially as a RAM blovk in your processe's memory space. But you then can close and reopen it meaning it's guaranteed to be persistent.