| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boywitharupee 1114 days ago
	zero-copy with mmap was added to llama.cpp, but the way it was implemented sparked controversy.

1 comments

fathyb 1114 days ago

I think GP meant zero-copy communication with the GPU, eg. through `newBufferWithBytesNoCopy` [0], which is only possible with unified memory architectures, eg. integrated GPUs.

The mmap change was just about mapping the model files in memory instead of copying them, which has less overhead.

[0]: https://developer.apple.com/documentation/metal/mtldevice/14...

link

astrange 1113 days ago

> I think GP meant zero-copy communication with the GPU, eg. through `newBufferWithBytesNoCopy` [0], which is only possible with unified memory architectures, eg. integrated GPUs.

It can still be beneficial on discrete GPUs because it avoids a copy inside the driver.

link

brucethemoose2 1114 days ago

Yeah this is precisely what I meant. I think it is possible in Vulkan too.

I brought it up because shuffling weights around takes lots of time, takes up more RAM, and saps memory bandwidth the IGP and CPU desparately need.

link