Hacker News new | ask | show | jobs
by dredmorbius 4512 days ago
I think stating that manpages are short is a little inaccurate.

You're arguing against the data. Care to continue doing so?

There are a few outliers, but really, the average page length is pretty low.

I'm running this right now, and what I'm getting is (5831 pages processed):

    mean:  3.9 pages
    median:  2 pages
    standard deviation:  11.06 pages
If you'd prefer precentiles:

    5th %ile:  1
    25th %ile: 1
    75th %ile: 3
    95th %ile: 12
Max is 476 pages (zshall(1)).

This is on a Debian GNU/Linux system with 3479 packages installed. I checked only English manpages under /usr/share/man (there are another dozen or so pages for openjdk under /usr/lib/jvm, max length 29 pages).

Obviously, mileage may vary.

Code:

    cd /usr/share/man
    time for f in $(find man[0-9] -type f -name \*.[0-9].gz); do echo -e "$f\t\c"; man $f 2>&1 | pr | grep '      Page [0-9][0-9]* *$' | tail -1 | sed 's/^.*\(Page \)//'; done
Takes about 4.5 minutes to run on my system.

Compute statistical moments with your preferred analysis tool (I've got one I wrote for the purpose).

There are 68 manpages 37 pages or longer in length (2 standard deviations over mean). I see several shells (zshall, bash, tcsh), many perl utilities, and a few complex tools (mutt, openvpn, wireshark, busybox) among them. Pretty small count, actually.

Man pages were designed before a paradigm shift from systems oriented to user oriented design and are usually written in a manner that makes them easy to write, not easy to read.

While understanding a manpage's format is useful, ANY documentation which is "easy to write, not easy to read" is a bug.

As for alternatives to manual pages, I find the Linux Documentation Project's HOWTOs to be an excellent adjunct. GNU info pages, not so much.

1 comments

I meant generalizing all manpages as short is inaccurate. The original statement was that the average is 4 pages which tells you almost nothing without some idea of the spread. You have provided that, thank you.

5831 is the number of manpages processed, or the number of total pages inside them processsed?

My system has 18,590 manpages. Running:

  cd /usr/share/man
  find ./man[0-9] -type f -exec zgrep '^[[:blank:]]*EXAMPLE' {} \;|wc -l
yields 17. While I admit that this may not be the best regular expression that would best match the pattern all the time, I think there is generally not an examples section. This may be because a lot of the documentation is for code that does not need examples.

  for i in $(seq 1 9);do echo $i $(ls man$i|wc -l);done
  1 2554
  2 454
  3 16014
  4 45
  5 346
  6 49
  7 284
  8 821
I don't have any man 9 pages, and cropped the error from that part.

While it may be true that documentation that is "easy to write, not easy to read" is a bug, I think it is very unlikely that the average person would file a bug report for documentation.

I did not know about The Linux Documentation Project's HOWTOs, very neat.

EDIT: I also wanted to point out that depending on what section you are looking at manpages for, you have different statistics.

  man1 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   5.519   4.000 470.000

  man2 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   2.000   3.018   3.000  31.000 

  man3 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   3.067   3.000 113.000 

  man4 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    2.00    3.95    5.00   21.00 

  man5 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   5.283   5.000 149.000 

  man6 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   2.306   2.000  15.000 

  man7 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.000   3.000   5.808   6.000  69.000 

  man8 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    2.00    3.06    3.00   77.00
5831 is the number of manpages processed, or the number of total pages inside them processsed?

I'm not sure I follow your question. A man file contains a single manpage. My 'find' command explicitly specified objects of type file rather than symlinks to avoid double-counting manfiles with multiple symlinks (I find 2527 symlinks under /usr/share/man/man). That excludes hard links -- turns out that agetty.8 and getty.8 are hard linked:

    $ find man* -type f ! -links 1
    man8/agetty.8.gz
    man8/getty.8.gz
My system has 18,590 manpages.*

Mind if I ask what OS you're running? How many installed packages? And have you de-duplicated the manpages?

I think it is very unlikely that the average person would file a bug report for documentation.

I've filed such bugs. It's difficult to search across all packages, but there are a few filed for bash under Debian:

http://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject...

Re: different manpage section have different statistics: Well, yes. As I said, YMMV, however the mean is still generally within 1-2 pages of the 4-page mean I'd first described.

Note that nowhere did I say that all manpages have EXAMPLE sections. Many (most?) don't. Hrm ...

Of 3066 manpages in man1, I find 519 contain lines beginning with "EXAMPLE" in formatted output. That's roughly 17%.

My bigger point is that addressing this deficiency rather than creating de novo documentation projects would be highly preferable. Much less glory, sadly.

I found the term page ambiguous just on the very first use, whether you meant printed pages, or manpages.

I am running archlinux which would normally be light, but unfortunately I work with a lot of different programming languages and prefer different tools for each. I did not deduplicate any of the manpages. The number of packages that I have installed, listed by `pacman -Q|wc -l` is 1241.

I agree that it would be nice to have a single place to go to get all of the details, but I think manpages suffer from more than just examples. I think that someone just starting *nix is likely to look online. While you can search manpages with apropos or man -k it doesn't search the whole document and people just starting don't know the terminal commands to search more.

I would rather have 2 commands and have all of the information than just manpages that don't get frequently updated.

I found the term page ambiguous just on the very first use, whether you meant printed pages, or manpages.

"man page" == a given 'Nix feature's manual entry. I've tried to use "manpage" where this is being presented.

"Pages" without the "man" modifier indicates the number of printed pages produced (defaulting to one page per 56 lines of output using 'pr'). The page count is actually reduced somewhat when printing postscript: 72 vs. 98 pages for bash(1).

While you can search manpages with apropos or man -k...

Again: this is where tools such as Debian's dwww (also available on Debian derivatives such as Ubuntu) come in very handy: the manpages are presented as Web pages on the localhost webserver, and with an installed search engine ("swish++"), the full text of all documentation is indexed.

And just to be clear: Debian provides copious amounts of documentation packages: manpages, info pages, the Linux Documentation Project (in HTML or ASCII text: doc-linux-html / doc-linux-text), RFCs, and multiple guides and such. With dwww, all of this is presented locally as web documents, indexed, and searchable. Seems there's little reason Wikipedia couldn't be included as well, though you'd want to sync that regularly.