The idea is more that when you process data, if you can fit it all in memory (and you don't need lots of CPU power, etc, etc, etc) then just use one machine and don't worry about "clusterising" it.
If you're expecting growth in the size of your dataset (beyond growth in RAM size availability), then, well, maybe don't just use a single machine. Same goes for a whole bunch of similar "it's too large for a single machine" considerations.
Storing data should probably still be persisted to disk, and backed up.
There are multiple strategies that are usually handled by the database that you use. For some databases a hard power off will lose the uncommitted data, for more durable ones it waits until the write is confirmed.
Generally though, these posts are geared towards machine learning people that don't really have "live" data as frequently.
If you're expecting growth in the size of your dataset (beyond growth in RAM size availability), then, well, maybe don't just use a single machine. Same goes for a whole bunch of similar "it's too large for a single machine" considerations.
Storing data should probably still be persisted to disk, and backed up.