Hacker News new | ask | show | jobs
by senozhatsky 2336 days ago

   > printf("%s\n", "Hello, World!");
   >
   > That's an awful lot of symbolic syntax.
Well... Because it should have been

    printf("Hello, World!\n");
in the first place?

One can do something like

    printf("%s,%s%c\n", "Hello", "World", '!');
and claim that C is awful and that

    displayln("Hello, World!");
is so much better.
4 comments

Firstly, I think you're taking this waaaayyyy more seriously than it was intended. Secondly

  double foo = 1.2;
  printf(foo);
  puts(foo);
won't compile, while

  double foo = 1.2;
  display(foo);
works fine.

Incidentally I actually think display is the only thing on this list that is probably worth using, you could also probably extend it to accept multiple arguments relatively simply as well.

How about just

    puts(“Hello, World!”)
which has been in C since the dawn of time?
GCC at least will compile a printf without special formatting characters as a pit, so it ends up being six of one in the end.
Looks even better!
I think this is part of the problem with the language- a fragmented set of keywords, all with slightly different behavior.
puts and printf aren't keywords, they're regular functions.
Well, sort of.

Specifically, printf is an oddball function because it uses the varargs mechanism, and the whole format strings mechanism is inherently risky because it effectively bypasses the type system and says "trust me." Back when I was learning C, on a Mac with THINK C, misusing printf was a sure-fire way to crash the computer very quickly, especially since misaligned accesses of 16-bit or 32-bit words caused crashes. Compilers now go to a great deal of trouble to try to do additional safety and consistency checks.

Don't get my wrong, I grew up using printf, and it is massively useful. But it was designed when computers were much smaller and simpler, and design tradeoffs were made back then that probably wouldn't be chosen today. So printf, along with a whole family of related functions, has been a seething mess of a security and safety hole longer than most programmers have been alive.

No.

The popular C compilers have a feature where they will do some additional type checking on the arguments passed to "format" functions. You can mark your own functions with this attribute.

See the format attribute https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute....

printf is not an oddball function. Also, typechecking format strings in general does not have to be that complicated. They are still used in golang.

Of all the security pitfalls of C, the format string design of printf is way down the list. As others have noted, printf is not what makes the C type system weak.

Nope, printf is a regular function - regular functions can use varargs just fine.

And it's C; everything bypasses the type system and says "trust me". Memory allocation bypasses the type system and says "trust me".

  struct foo* foo = malloc(sizeof foo);
  // yep, this is definitely the right number of bytes
If you want strong typing (!= static typing), C is not the language you should be using, printf or no printf.
Format strings (... at least, static format strings) don't have to bypass the type system.
Sure, but that's varargs being the special cased but, not printf (I've written printf implementations for some ebedded systems, it's always just regular C code).
This works for a literal, but the author was probably thinking of "print a string that comes from somewhere else" where you do the first thing to avoid format characters to be interpreted.
Could be. But wouldn't then displayln() require the same string placeholder? Be it "%s", or "{1}" or someting else.
displayln uses the massive _Generic in display_format to supply a large number of formatters automatically. [0]

[0] https://gist.github.com/shakna-israel/4fd31ee469274aa49f8f97...

I see. So when for a char pointer one needs to do

        printf("%p\n", v);
the Generic call must look like this

        displayln((const void *)v);
is it really better?
Or you could have the pointer be a pointer before you feed it to the function.

    void* v;
    v = ...;

    ...

    displayln(v);
The amount of ellipsis shows that your solution is just as heavy weight.
displayln is a _Generic.

All of these are valid:

    displayln("Hello, World!");
    displayln(100);
    displayln(1.8);
The point is, for simple things, to not have to specify how they appear.

> Well... Because it should have been

> printf("Hello, World!\n");

No. You don't really want to do that. If you're doing that, use puts [0] . All this requires is a modification to one string in memory and you have an injection vulnerability.

[0] http://www.cplusplus.com/reference/cstdio/puts/

GCC automatically replaces printf("foo\n") with puts("foo") even with -O0. Clang does it too, albeit I have to enable optimizations: https://godbolt.org/z/drw4xP . As a result I never use puts for literal strings, this way if I want to add dynamic parameters later I don't have to change the function call.

I'm pathologically lazy.

Also as others point out typically the literal string will be in a ro segment so tampering with it won't be easy unless the code runs in a rather exotic environment.

The good practice that you seem to have read about and badly misunderstood is that the format string argument to a printf family function should always be a string literal right there in the code calling it. The concept of changing printf("foo") to printf("%s", "foo") for 'security' against your own hardcoded string literal is more bizarre than any of the intentionally bizarre convolutions in your posted project.
> No. You don't really want to do that. If you're doing that, use puts

This is not the case for RO strings, is it?

You're not guaranteed to have a RO string by the standard are you?
Good point. I suppose that you are talking about overwriting terminating NUL, do I get it right? puts() is vulnurable in exactly same way.
No, a format string attack. If you replace the start of the string with various specifiers, you can lift out pointer addresses and write to them. You can't do that with puts. Worst you can do with puts is read.
OK, understood. So what you meant was more like

   puts()'s attack surface is smaller than printf()'s.
This is not how your message appeared to me - "printf() is vulnerable to injections, use puts() instead". Both are vulnerable to "unintended read()-s".
If the string was in a variable, yes, but in this case most implementations will put it in a RO section.
That'll be depending on undefined behaviour, though, correct?
That's not UB, that's implementation dependent, AFAIK the C standard says nothing about read-only memory. Attempting to modify a string literal is indeed UB but that would only happen if an attacker managed to attempt to modify the string, not when the program is used normally.
> That's not UB, that's implementation dependent, AFAIK the C standard says nothing about read-only memory.

If we're being pedantic, it's _unspecified behaviour_. The implementation isn't required to document how it would behave.

Other important security features like N^X are also not specified in the C standard either, so what's the point of worrying about just printf format strings?
TBH it’s not pedantic to understand the difference between undefined, implementation-defined, and unspecified behavior. It’s part of knowing the language. One of my standard interview questions for C candidates is to ask them to describe and provide an example of each.
Is it really not UB to attempt to modify RO things? Typically (on major implementations, e.g. GNU/Linux or MSVC/Windows) it will crash, and I don't think it is allowed to crash on non-UB things?
Can you please exemplify how you exploit a printf("Hello, World!\n") ?
That's a format string attack [0].

By modifying the start of that string, you can begin reading and writing to various parts of the stack.

Whilst implementations may inline that string into a RO memory region - that's not defined behaviour, so you shouldn't depend on it.

[0] https://owasp.org/www-community/attacks/Format_string_attack

> By modifying the start of that string

In order to modify that string, even in RW pages, the attacker already has to have access, at which point the point is moot. It's like saying "if you can change memory, then you can change memory"....

Agreed, if the attacker can modify string constants is,

    printf("Hello, World!\n");
really any safer than this?

    printf("%s\n", "Hello, World!");
No, they’re equivalent in terms of security.

The article’s author (posting here on HN) is grossly mistaken.

By that same logic, puts("Hello, world!"); is also vulnerable to DoS attack and information leak since someone could have removed the NUL terminator at the end of the string and have puts() read uninitialized/unmapped memory. Which is absurd logic.
Format string attacks have occurred in the wild. [0]

> Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code.

They are not the same as puts. Puts can allow you to potentially read memory.

A format string attack can allow you to write to memory.

[0] https://en.wikipedia.org/wiki/Uncontrolled_format_string

Yes if the attacker has control of the string (like if you do printf(getenv("FOO")) or something equally stupid).
So an attacker able to write to memory would be able to elevate into the ability to... write to memory. That doesn't sound particularly worrisome.
And what, pray tell, stops me from modifying the memory at your “%s” string’s location in memory?

Nothing.

Neither is more secure, all modern compilers put both “%s” and “Hello, world!” in rodata sections.

Your understanding of practical format string attacks is misguided.

It's also not defined whether the executable code is in RO memory or RW memory. By your argument, we should also be concerned that the attacker could modify the code directly.