| > you shouldn't really need a dedicate script to do that It isn't because the terminal is mis-configured and/or mis-interpreting those characters, it's because these characters have historically been used in a special way by troff's ascii output driver, as used by the man command. Bold characters are "emulated" by outputting the character itself, followed by ^H, then by repeating the character, and underline is emulated by outputting _ (the underscore character) followed by ^H, followed by the character to be underlined. This is the same way that bold/underline was achieved on a manual typewriter or by old teletype terminals with paper output. Pagers like "more" & "less" have in-built behaviour that knows how to interpret these sequences and render bold or underline appropriately, and if you "cat" the file your terminal would probably ignore them, but if you open a file with those sequences in a text editor, you're going to see a bunch of unnecessary ^H characters. The OP's script uses the "col" command to remove the unnecessary ^Bs (and the preceding character) that troff has output. By default, GNU groff doesn't actually output those sequences anymore and uses ANSI escape codes instead. AFAIK many (most?) distributions actually compile out the ANSI behaviour in favour of the old way though (because "more" and "less" don't actually behave correctly with the ANSI characters by default), but some don't, which breaks the script (it's broken in Cygwin, for instance). FWIW, if you need a fool-proof way to convert a man page to plain old ASCII with no escapes at all, it's easiest just to redirect the output of "man" to a file: man ls > ls.txt
The long-winded way (with groff) is something like: zcat /usr/share/man/man1/ls.1.gz | groff -man -Tascii -P -cbdu > ls.txt
|
Right
>and if you "cat" the file your terminal would probably ignore them
I think the terminal does not ignore them. My guess is that it interprets the characters, but it has no noticeable visual effect on the screen. E.g. if the letter "c" in "cat" is output as "c^hc" (to make in bold in print), the terminal will just print "c", a backspace, then "c" again, which to the user will look the same as a single "c".