Hacker News new | ask | show | jobs
by boywitharupee 1114 days ago
zero-copy with mmap was added to llama.cpp, but the way it was implemented sparked controversy.
1 comments

I think GP meant zero-copy communication with the GPU, eg. through `newBufferWithBytesNoCopy` [0], which is only possible with unified memory architectures, eg. integrated GPUs.

The mmap change was just about mapping the model files in memory instead of copying them, which has less overhead.

[0]: https://developer.apple.com/documentation/metal/mtldevice/14...

> I think GP meant zero-copy communication with the GPU, eg. through `newBufferWithBytesNoCopy` [0], which is only possible with unified memory architectures, eg. integrated GPUs.

It can still be beneficial on discrete GPUs because it avoids a copy inside the driver.

Yeah this is precisely what I meant. I think it is possible in Vulkan too.

I brought it up because shuffling weights around takes lots of time, takes up more RAM, and saps memory bandwidth the IGP and CPU desparately need.