Hacker News new | ask | show | jobs
by laumars 2767 days ago
3MB is not enough to store the kind of data you’re talking about. In your example you were coping files from an SSD to a removable device; what happens if that file is 4MB big? Or 20MB?

Some program executables are multiple megabytes in size these days, your CPU cache is much more valuable storing that than it is storing a random 20MB XLSX file you happened to copy half an hour earlier. ;)

> You do. You show wrong info in the worst case.

Yes, I’d repeatedly said that myself. However I was making a case study rather than giving a lecture on how your tool should be written. Remember when I said “there is no right or wrong answer”? (You have ignored my point that you cannot guarantee the kernels / fs drivers cache isn’t also stale though. That does happen in some rare edge cases).

My point is you shouldn’t be so certain that you’re method is the best because there are plenty of scenarios when accuracy and low memory usage is more costly (eg the fastest file copying tools are those that buffer data - costing more RAM during usage). Plus let’s not forget that syscalls are expensive too. But that’s the trade off you take and that is what is advantageous for the perpose you’re targeting nnn for.

So there really isn’t a right or wrong answer here at all and memory usage is in fact only a very small part of the equation (which is what the other projects author what trying to say).

1 comments

> Some program executables are multiple megabytes in size these days

I have shared the figures from my system right now in my earlier comment. The memory required for the meta info is extremely cache-able.

And yes, let's call it a day now. In fact, I am happy we had a very informative discussion towards the end. This is what I expected from the other dev and you from the very beginning. I am always ready for a productive non-abusive technical discussion. And if I find someday `ncdu` takes comparable/lesser memory than `nnn` I will update that data verbatim as well. But I wouldn't take uncalled-for shitstorm in public forums from strangers lying low.

But that’s what you’re storing in the system memory, not CPU cache. Ok let’s say the kernel does cache recently accessed memeory in the L2 cache, you’re still competing with everything else - particularly in a multi-user or multi-threaded system. So that is not going to stay in L2 for long.

You’re making so many assumptions about the host platform and they simply don’t stack up for the majority of cases - it’s just that most systems are fast enough to hide your incorrect assumptions.

Also your understanding of how file system caching works is completely off. Granted that’s not something most normal engineers need to worry about, but given the claims you’re making about nnn, I would suggest you spend a little time researching the fundamentals here.

Edit:

> I am happy we had a very informative discussion towards the end. This is what I expected from the other dev

That was exactly what the other dev was doing. You just weren’t ready to listen to him - which is why I started making comments about your “narky” replies. ;)

Well, I shared the numbers, I gave you ways to test it. We are done here.
but You’re not proving that the data you are expecting to be in L2 actually ends up in L2. Let alone persists long enough to be recalled from L2 when you do your rescans. Your test is crude and thus doesn’t prove anything. Which is what we have all been telling you right from the start!!!!

You cannot just throw numbers up and say “look I’m right” if the test of flawed from the outset.

But yes, it’s probably better we do call it a day.

I expected you would run the test. Anyway, here's the data and it aligns with my expectation and figures.

  ## Fired `nnn` in my home dir.
     - all cached, no major page faults
  
  ~$ /usr/bin/time -v nnn
  	Command being timed: "nnn"
  	User time (seconds): 0.00
  	System time (seconds): 0.00
  	Percent of CPU this job got: 0%
  	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.77
  	Average shared text size (kbytes): 0
  	Average unshared data size (kbytes): 0
  	Average stack size (kbytes): 0
  	Average total size (kbytes): 0
  	Maximum resident set size (kbytes): 3400
  	Average resident set size (kbytes): 0
  	Major (requiring I/O) page faults: 0
  	Minor (reclaiming a frame) page faults: 274
  	Voluntary context switches: 6
  	Involuntary context switches: 1
  	Swaps: 0
  	File system inputs: 0
  	File system outputs: 0
  	Socket messages sent: 0
  	Socket messages received: 0
  	Signals delivered: 0
  	Page size (bytes): 4096
  	Exit status: 0
  ~$
  
  ## Fired `nnn` on an external disk root in du mode, went into a subdir, came back to root.
     - no major page faults
     - File system inputs: 14680
     - 220 KB cached extra
  
  ~$ /usr/bin/time -v nnn -S /media/vaio/49462fdf-010c-40cc-89db-ef125e7dae99/tmp/
  	Command being timed: "nnn -S /media/vaio/49462fdf-010c-40cc-89db-ef125e7dae99/tmp/"
  	User time (seconds): 0.02
  	System time (seconds): 0.27
  	Percent of CPU this job got: 3%
  	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.30
  	Average shared text size (kbytes): 0
  	Average unshared data size (kbytes): 0
  	Average stack size (kbytes): 0
  	Average total size (kbytes): 0
  	Maximum resident set size (kbytes): 3620
  	Average resident set size (kbytes): 0
  	Major (requiring I/O) page faults: 0
  	Minor (reclaiming a frame) page faults: 337
  	Voluntary context switches: 576
  	Involuntary context switches: 2
  	Swaps: 0
  	File system inputs: 14680
  	File system outputs: 0
  	Socket messages sent: 0
  	Socket messages received: 0
  	Signals delivered: 0
  	Page size (bytes): 4096
  	Exit status: 0
  ~$
  
  ## Fired `nnn` again (around 2 mins later, I was formatting the above text) on the external disk root in du mode.
     - File system inputs: 0
  
  ** So even between 2 distinct instances of nnn, _all_ the data fetched in the first instance remained cached.
  
  ~$ /usr/bin/time -v nnn -S /media/vaio/49462fdf-010c-40cc-89db-ef125e7dae99/tmp/
  	Command being timed: "nnn -S /media/vaio/49462fdf-010c-40cc-89db-ef125e7dae99/tmp/"
  	User time (seconds): 0.00
  	System time (seconds): 0.01
  	Percent of CPU this job got: 0%
  	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.31
  	Average shared text size (kbytes): 0
  	Average unshared data size (kbytes): 0
  	Average stack size (kbytes): 0
  	Average total size (kbytes): 0
  	Maximum resident set size (kbytes): 3616
  	Average resident set size (kbytes): 0
  	Major (requiring I/O) page faults: 0
  	Minor (reclaiming a frame) page faults: 329
  	Voluntary context switches: 5
  	Involuntary context switches: 0
  	Swaps: 0
  	File system inputs: 0
  	File system outputs: 0
  	Socket messages sent: 0
  	Socket messages received: 0
  	Signals delivered: 0
  	Page size (bytes): 4096
  	Exit status: 0
  ~$
Yes, I know there is a cache. I went into great detail about it. That wasn’t what I asked you to prove.

What we were arguing about was the L2 cache - which you argued could hold a 2GB file (insanity I call it) and you’re still not proving that point of yours. In fact you’re not even checking if it’s cached in the system memory (there are syscalls to do that), you’re just checking application performance. While it’s pretty obvious that result would have been due to caching it would have been easy to prove that more conclusively by even just flushing the cache (I assume you actually know how to do that? Hint: you can just write to the /proc directory from your shell to do that. It’s easy) and then repeating the test; but You didn’t even do that!

You’re testing is so rudimentary at covering the cases you’re making claims against it’s laughable. you’ve not even begun to address the “processor memory” point you were arguing.

But I honestly think we should just leave it here. You don’t need to be an expert in kernel design nor practical experience in write file systems to write a decent file manager, so all of this arguing is pointless. I just ask that you aren’t so quick to judge other solutions when it’s painfully obvious there are massive gaps in your own understanding.