admin管理员组

文章数量:1279043

I have a puzzling problem regarding optimization and pointer equality that seems to happen only in GCC from version 7.1 and higher until the latest GCC version.

The Problem can be shown with one simple example:

#include <stdio.h>
#include <stddef.h>

int main() {
    int test1[3] = {0};
    int test2[3] = {0};
    ptrdiff_t diff = test2 - test1;
    int *test3 = test1 + diff;

    printf("%p == %p = %d\n", test1 + diff, test2, test1 + diff == test2);
    printf("%p == %p = %d\n", test1 + diff + 2, test2 + 2, test1 + diff + 2 == test2 + 2);
    printf("%p == %p = %d\n", test3 + 2, test2 + 2, test3 + 2 == test2 + 2);

    printf("\n\n");

    printf("%p == %p = %d\n", &(test1 + diff)[0], &test2[0], &(test1 + diff)[0] == &test2[0]);
    printf("%p == %p = %d\n", &(test1 + diff)[2], &test2[2], &(test1 + diff)[2] == &test2[2]);
    printf("%p == %p = %d\n", &test3[2], &test2[2], &test3[2] == &test2[2]);
    
    return 0;
}

The code performs pointer arithmetic between the two arrays. The goal is to find the Element in array2 at the same offset in array1. diff is the offset of the first elements of the arrays in memory. The problem only occurs with optimization set to -O1 or higher.

The expected output would be for all checks to be = 1 however this program will print

0x7fff4aa00e78 == 0x7fff4aa00e78 = 1
0x7fff4aa00e80 == 0x7fff4aa00e80 = 0
0x7fff4aa00e80 == 0x7fff4aa00e80 = 1


0x7fff4aa00e78 == 0x7fff4aa00e78 = 1
0x7fff4aa00e80 == 0x7fff4aa00e80 = 0
0x7fff4aa00e80 == 0x7fff4aa00e80 = 1

If I perform the check of the pointer to the first element directly it shows as equal but if i try to compare the nth element in the array the comparison shows as false, even though the address is clearly the same.

More interesting even, if i store the result of the arithmetic operation in a new pointer and compare the array elements from there, it shows up as equal again.

I have searched far and wide and have not found a suitable explanation for this problem other than gcc performing an invalid optimization here. The closest i could find was described in this Blogpost .html. However the problem in the Blog targets the "one element after" specifically which is not the case in my problem.

I could find no other compiler that exhibits this exact problem (tested with several MSVC and clang versions). Compiling with GCC 6.5 shows the correct output.

I have a puzzling problem regarding optimization and pointer equality that seems to happen only in GCC from version 7.1 and higher until the latest GCC version.

The Problem can be shown with one simple example:

#include <stdio.h>
#include <stddef.h>

int main() {
    int test1[3] = {0};
    int test2[3] = {0};
    ptrdiff_t diff = test2 - test1;
    int *test3 = test1 + diff;

    printf("%p == %p = %d\n", test1 + diff, test2, test1 + diff == test2);
    printf("%p == %p = %d\n", test1 + diff + 2, test2 + 2, test1 + diff + 2 == test2 + 2);
    printf("%p == %p = %d\n", test3 + 2, test2 + 2, test3 + 2 == test2 + 2);

    printf("\n\n");

    printf("%p == %p = %d\n", &(test1 + diff)[0], &test2[0], &(test1 + diff)[0] == &test2[0]);
    printf("%p == %p = %d\n", &(test1 + diff)[2], &test2[2], &(test1 + diff)[2] == &test2[2]);
    printf("%p == %p = %d\n", &test3[2], &test2[2], &test3[2] == &test2[2]);
    
    return 0;
}

The code performs pointer arithmetic between the two arrays. The goal is to find the Element in array2 at the same offset in array1. diff is the offset of the first elements of the arrays in memory. The problem only occurs with optimization set to -O1 or higher.

The expected output would be for all checks to be = 1 however this program will print

0x7fff4aa00e78 == 0x7fff4aa00e78 = 1
0x7fff4aa00e80 == 0x7fff4aa00e80 = 0
0x7fff4aa00e80 == 0x7fff4aa00e80 = 1


0x7fff4aa00e78 == 0x7fff4aa00e78 = 1
0x7fff4aa00e80 == 0x7fff4aa00e80 = 0
0x7fff4aa00e80 == 0x7fff4aa00e80 = 1

If I perform the check of the pointer to the first element directly it shows as equal but if i try to compare the nth element in the array the comparison shows as false, even though the address is clearly the same.

More interesting even, if i store the result of the arithmetic operation in a new pointer and compare the array elements from there, it shows up as equal again.

I have searched far and wide and have not found a suitable explanation for this problem other than gcc performing an invalid optimization here. The closest i could find was described in this Blogpost http://kristerw.blogspot/2016/12/pointer-comparison-invalid-optimization.html. However the problem in the Blog targets the "one element after" specifically which is not the case in my problem.

I could find no other compiler that exhibits this exact problem (tested with several MSVC and clang versions). Compiling with GCC 6.5 shows the correct output.

Share Improve this question edited Feb 24 at 12:53 S Schulze asked Feb 24 at 12:50 S SchulzeS Schulze 1399 bronze badges 9
  • 10 To begin with, ptrdiff_t diff = test2 - test1; is undefined behavior so anything can happen. – Lundin Commented Feb 24 at 12:52
  • 1 While it's okay to compare unrelated pointers for relative position, you are not allowed to perform arithmetic on unrelated pointer. &test1[2] - &test1[0] is fine, &test2[0] - &test1[0] (which is what you're doing) is not. – Some programmer dude Commented Feb 24 at 12:54
  • 2 To elaborate on the comments above: you can only perform pointer arithmetic within the same array. – wohlstad Commented Feb 24 at 12:54
  • 1 ...and I get the "correct" result with gcc 14.2 -O3 (x86-64 Linux) but not with -O0, so yeah this is all because of undefined behavior. – Lundin Commented Feb 24 at 12:55
  • 3 @Someprogrammerdude: Re “While it's okay to compare unrelated pointers for relative position”: It is defined to compare unrelated pointers with == or != but not with <, >, <=, or >=. – Eric Postpischil Commented Feb 24 at 13:18
 |  Show 4 more comments

2 Answers 2

Reset to default 4

Pointer arithmetic in C has always only been allowed to get carried out inside an array. Where single variables like structs etc (scalars) can get treated like an array of just one single item.

Doing pointer arithmetic outside the bounds of an array is undefined behavior - anything can happen. And as we can see from your example, this appears to be the perfect example of "anything happens". What is undefined behavior and how does it work?

Specifically, C23 6.5.7 (additive operators) says:

If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined.


Update:

Trying to explain why one particular form of undefined behavior happened is a rather uninteresting exercise from which not much of value is learnt, but anyway... here goes:

Specifically, what seems to happen (for me) when optimizations are disabled -O0 is that the arithmetic actually gets carried out in the generated machine code:

    lea     rax, [rbp-40]
    lea     rdx, [rbp-28]
    sub     rax, rdx

That is, two addresses are loaded from an address relative to the stack pointer, then subtracted - as you expect would happen.

Enabling optimizations, something different happens. For the first printf line it loads the address of the first argument then copies it to the second - the compiler assuming they are the same:

    lea     rsi, [rsp+4]
    mov     rdx, rsi

There is no subtraction in run-time. Instead the compiler pre-calculates the == comparison of the addresses at compile-time as a 1:

    mov     ecx, 1

In the second printf it does the same thing (lea + mov) but at the same time the compiler simply decided no - they are not the same:

    xor     ecx, ecx

This is setting ecx to zero for the == output. It's a x86_64 convention to use xor when setting a register to zero since it's a fast instruction.

Why it decided to do that, I don't know. Maybe since it is free to come to any conclusion it likes because of undefined behavior, it went with xor since that's maybe a clock tick faster than mov or something like that.

The parameter shuffling of the first two printf parameters is probably detached from the compile-time calculation of == and that's why the compiler decided "these are the same value" and then in the next line "but they aren't equivalent".

The program already has undefined behavior starting with

ptrdiff_t diff = test2 - test1;

- can only be used between pointers into (or one past) the same array object (where non array elements are considered to be elements of a one-element array for this purpose).

So any behavior by the compiler is valid.

本文标签: cInvalid optimization for pointer comparison in GCC (introduced in GCC 71)Stack Overflow