|
|
|
|
|
by amelius
1979 days ago
|
|
The footguns can be solved in part by the type-system (preventing certain types from being stored), and (if necessary) by cooperation with the OS (e.g. to guarantee that a file is not modified between runs). How else would you lazy-load a database of (say) 32GB into memory, almost instantly? And why require everybody to write serialization code when just allocating the data inside a mmap'ed file is so much easier? We should be focusing on new problems rather than reinventing the wheel all the time. Persistence has been an issue in computing since the start, and it's about time we put it behind us. |
|
By using an existing database engine that will do it for me. If you need to deal with that amount of data and performance is really important you have a lot more to worry about than having to use unsafe blocks to map your data structures.
Maybe we just have different experiences and work on different types of projects but I feel like being able to seamlessly dump and restore binary data transparently is both very difficult to implement reliably and quite niche.
Note in particular that machine representation is not necessarily the most optimal way to store data. For instance any kind of Vec or String in rust will use 3 usize to store length, capacity and the data pointer which on 64 bit architectures is 24 bytes. If you store many small strings and vectors it adds up to a huge amount of waste. Enum variants are also 64 bits on 64 bit architectures if I recall correctly.
For instance I use bincode with serde to serialize data between instances of my application, bincode maps almost 1:1 the objects with their binary representation. I noticed that by implementing a trivial RLE encoding scheme on top of bincode for running zeroes I can divide the average message size by a factor 2 to 3. And bincode only encodes length, not capacity.
My point being that I'm not sure that 32GB of memory-mapped data would necessarily load faster than <16GB of lightly serialized data. Of course in some cases it might, but that's sort of my point, you really need to know what you're doing if you decide to do this.