| https://stackoverflow.com/questions/5518084/memorymappedfile... There's the hard limit of 2GB for most versions of 32-bit Windows and 4GB for any operating system. Couple that with the requirement for a contiguous address space as well as various page table entry (PTE) limits, you get all sorts of "soft" limits way before 2GB. From what I've heard, 256MB is relatively safe to map, but anything much larger than that is increasingly likely to fail. Correctly written code should be able to work with moveable "windows" into the file as small as 32MB to be properly robust, especially if the process memory is already fragmented. Lots of software crashes with large files on 32-bit machines because of this. E.g.: https://www.monetdb.org/pipermail/users-list/2009-January/00... As a more recent example, ripgrep had issues on 32-bit platforms because of a bug in the way the underlying mmap library worked in Rust. Even on 64-bit platforms you can run into trouble. For example: https://jira.mongodb.org/browse/SERVER-15070 In that example, Windows Server 2008 R2 has an 8 TB limit. You could hit that if using a tool like ripgrep to do "forensic analysis" of disk images from a SAN, where virtual disks typically have 16 TB limit. So if you mount a SAN snapshot and open the disk as a file to scan it, you will hit this limit! Programmers make all sorts of invalid assumptions... |
ripgrep has always had a fast traditional buffering strategy using `read` calls for searching, because I knew that mmap couldn't be used in every case.
Anyway, this has been fixed for a couple years at this point, so if you're still experiencing a problem, then please file a new bug report.
> As a more recent example, ripgrep had issues on 32-bit platforms because of a bug in the way the underlying mmap library worked in Rust.
This is false. The bug you're thinking about is probably https://github.com/BurntSushi/ripgrep/issues/922, which was not caused by an underlying bug in memmap. memmap did have an underlying bug with respect to file offsets, but ripgrep did not use the file offset API. The bug was caused in ripgrep itself, since I made the classic mistake of trying to predict whether an mmap call would fail instead of just trying mmap itself. That bug was fixed on master before the Windows bug was even reported: https://github.com/BurntSushi/ripgrep/commit/93943793c314e05...
> You'd only try that if you haven't read the documentation for mmap, just like a bunch of Rust programmers did.
This isn't exclusive to Rust programmers. C tools make the same mistake all the time. Because memory maps aren't just problematic with large files on 32-bit systems, but they also don't work with virtual files on Linux. Try, for example, `ag MHz /proc/cpuinfo` and see what you get. Crazy how, you know, sometimes humans make mistakes even if they are a C programmer!
And the implication that I (or the author of memmap) never read the docs for `mmap` is just absurd.
If you're going to be snooty about stuff like this, then at least get the story correct. Or better yet, don't be snooty at all.