| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 1720 days ago
	> freqfs automatically caches the most frequently-used files and backs up the others to disk. This allows the developer to create and update large collections of data purely in-memory without explicitly sync’ing to disk, while still retaining the flexibility to run on a host with extremely limited memory. Why not let the OS take care of this?

4 comments

haydnv 1720 days ago

One advantage is consistency across host platforms, but the main advantage is that the file data can be accessed (and mutated) in memory in a deserialized format. If you let the OS take care of it, you would still have the overhead of serializing & deserializing a file every time it's accessed.

dfranke 1720 days ago

That's what mmap is for.

haydnv 1720 days ago

It might be possible to replace freqfs with mmap on a POSIX OS, but a) you would still have to implement your own read-write lock, and b) you would (I think probably?) lose some consistency in behavior across different host operating systems.

vlovich123 1720 days ago

Which OSes does this run on that doesn’t have some kind of mmap operation?

haydnv 1720 days ago

It should work on Windows (because tokio::fs works on Windows) although I have not personally tested this

julian37 1720 days ago

You can do mmap on Windows, eg. https://github.com/danburkert/memmap-rs

gpderetta 1720 days ago

mmaps for read, explicit API for writing, a-la LMDB. Buggy readers can read inconsistent data but cannot corrupt the os.

otterley 1720 days ago

Corrupt the OS? How might that happen?

gpderetta 1720 days ago

Sorry, I meant the DB!

alexruf 1720 days ago

Personally I don’t see a scenario for myself, but I can imagine that there are some where this might be useful. But isn’t there a extremely high risk of data loss an inconsistency when adding an extra layer on top of OS file system handling?

ericbarrett 1720 days ago

Freqfs seems like a shim you'd add to an existing project for a quick optimization. Whereas mmap et al. are "better" the same way any specific, built-to-purpose code will be "better" than just bolting a framework on. Sometimes it's the right call to do the extra work; sometimes it's 100% more effort (both development and maintenance) for an extra 10% gain.

haydnv 1720 days ago

If there is any concurrent access to cached files not through freqfs, there is a risk of inconsistency and crashes.

BiteCode_dev 1720 days ago

You can pick and chose.

Maybe your caching strat of you OS isn't best for your use case. Also, you may use a network file system, or several types of FS, and want your cache warm up to be tuned up and consistent.

the8472 1720 days ago

> Maybe your caching strat of you OS isn't best for your use case.

On the other hand the OS does know about memory pressure from IO and from heap memory for the whole system. This crate will only know about cache pressure within a single process.

> Also, you may use a network file system

Which can also be set to do aggressive caching, at the expense of consistency.

> and want your cache warm up to be tuned up and consistent.

the description doesn't say that it's doing cache warmup any more eagerly as regular reads would

kbenson 1720 days ago

The benefit of bringing something in process is as always more control, and usually at the expense of having to make decisions with less data about the rest of the system than an OS level service would have.

Sometimes you need very explicit control over when things are read from cache and when they aren't. This can be hard with network file systems. Especially when you have two different use cases on the same filesystem, which isn't that odd, even within a single application.

shakna 1720 days ago

Presumably for a similar usecase as SQLite [0]. Performance. You can beat the OS, and by a noticeable margin, by doing things in memory and avoiding the I/O bottleneck.

[0] https://www.sqlite.org/fasterthanfs.html

cornstalks 1720 days ago

I think GP's point is that the OS usually has a file system cache in RAM.

topspin 1720 days ago

I think the P's point, supported by evidence, is that the OS cache is not optimal for all use cases.

edoceo 1720 days ago

No cache is optimal for all use cases. That's an impossible goal.

topspin 1720 days ago

Thus why things like Freqfs exist and we don't always "let the OS take care of this."

edoceo 1720 days ago

Yea friend, we're in violent agreement :)

fsckboy 1720 days ago

the actual question is whether the person making the choice is up to the task of measuring/proving that their manual caching is optimal for their use case. A lot of the time the answer is simply "no". For example, the people who say "just buy enough RAM and turn virtual memory off" for the most part do not understand the implications of what they are talking about.

jdeaton 1720 days ago

My thought exactly.