admin管理员组

文章数量:1296257

Suppose I have a union containing an array:

union Set {
    uint64_t z[2];
    uint32_t y[4];
    uint16_t x[8];
};

Now suppose I have an array of these objects:

union Set sets[SIZE] = { /* ... */ };

I can safely access any int16_t in any of the Set objects by indexing sets and then indexing the resulting x field:

uint16_t value = sets[1].x[0];

But is there any way I can safely access the same uint16_t based on the absolute position from the beginning of entire array of set objects? Is the following safe and guaranteed to produce the same value as the above? Assuming the union has no padding.

uint16_t value = ((uint16_t *)sets)[8];

From what I could gather this would not be valid if set were a straight multi-dimensional array. But I am not sure if those rules apply here or not since pointer I am dereferencing derived from the main sets array in which the target value resides so no 'out of bounds' here. But am I wrong or are there other undefined behaviors here?


What if sets was returned by malloc() instead of a declared array? Would the answer be any different?

Suppose I have a union containing an array:

union Set {
    uint64_t z[2];
    uint32_t y[4];
    uint16_t x[8];
};

Now suppose I have an array of these objects:

union Set sets[SIZE] = { /* ... */ };

I can safely access any int16_t in any of the Set objects by indexing sets and then indexing the resulting x field:

uint16_t value = sets[1].x[0];

But is there any way I can safely access the same uint16_t based on the absolute position from the beginning of entire array of set objects? Is the following safe and guaranteed to produce the same value as the above? Assuming the union has no padding.

uint16_t value = ((uint16_t *)sets)[8];

From what I could gather this would not be valid if set were a straight multi-dimensional array. But I am not sure if those rules apply here or not since pointer I am dereferencing derived from the main sets array in which the target value resides so no 'out of bounds' here. But am I wrong or are there other undefined behaviors here?


What if sets was returned by malloc() instead of a declared array? Would the answer be any different?

Share edited Feb 17 at 7:08 starball 52.2k32 gold badges214 silver badges901 bronze badges asked Feb 12 at 2:10 CPlusCPlus 4,91045 gold badges30 silver badges73 bronze badges 14
  • 1 I do not have time to write up full language-lawyer answer, but there are at least two methods. One, the uint16_t element at offset n elements from the start is obviously sets[n/8].x[n%8]. Two, it is * (uint16_t *) ((char *) &sets + n * sizeof (uint8_t)), since the C standard guarantees the bytes that represent an object (including an array) are addressable as a sequence of char elements (and you have assumed no padding). – Eric Postpischil Commented Feb 12 at 3:16
  • 1 If we go fully pedantic, the standard guarantees that the bytes representing an object are individually accessible but maybe could be interpreted not to guarantee we can convert the char * back to the type of some subobject in the middle of the object and access that subobject. In that case, we can use uint16_t t; memcpy(&t, (char *) &sets + n * sizeof t, sizeof t);, after which t will contain the value of the desired element. – Eric Postpischil Commented Feb 12 at 3:18
  • @EricPostpischil If I cast to char * to perform the arithmetic then cast back to uint16_t * the object at that location should be one of the uint16_t members of the array in the union, so that should not count as type-punning/strict-aliasing violations right? But yes, the safest way would be sets[n>>3].x[n&7] but I am wondering if there are any 'shortcuts' that will also be safe. – CPlus Commented Feb 12 at 3:25
  • Pedantically, there is no guarantee that converting a char * calculated to point to the first byte of a subobject yields a valid pointer when converted to a pointer to the subobject type. The standard only fully defines a few pointer conversions, such as that a pointer to an object can be converted to a pointer to another object type and back, and that yields something equal to the original pointer. But it does not make a guarantee if arithmetic is done on the pointer to the other type. – Eric Postpischil Commented Feb 12 at 3:43
  • @EricPostpischil I was under the impression that accessing a struct member by offset (adding an offset in bytes) and then casting back to the member is valid, as suggested in this answer. I am not sure if this extends to one value of an array member though, and I cannot yet find an authoritative reference, but I will keep looking. – CPlus Commented Feb 12 at 3:51
 |  Show 9 more comments

4 Answers 4

Reset to default 1

This is essentially the same as the multidimensional array case.

The initial cast is fine:

(uint16_t *)sets

Because a pointer to a union can be converted to a pointer to each of its members. This gives you a valid pointer to sets[0].x whose type is uint16_t[8]. The subsequent indexing:

((uint16_t *)sets)[8]

Is not valid because it dereferences a pointer to one past the end of the array. As with the multidimensional array case, it doesn't matter that there happens to be an object of the same type immediately after the array.

The most obvious and safest option is to simply break up the index into the Set index and the x index within the set:

uint16_t value = sets[i >> 3].x[i & 7];

The ((uint16_t *)sets)[8]; method will not be safe because casting sets directly to a uint16_t has the effect of accessing the x array first element leading to the out-of-bounds problem.

However a comment did suggest an alternative solution. Any object pointer can be safely converted to a char pointer. Apply the offset via the char * and then that to a uint16_t * and dereference. This should be safe for the same reason accessing a struct member via a char * and offsetof() is valid.

uint16_t value = *(uint16_t *)((char *)sets + i * sizeof(uint16_t));

The only difference between this and the first solution is that this one depends on the union not having any trailing padding but the first one does not.

((uint16_t *)sets) is an allowed cast since the pointer to a union may be converted to a pointer to any of the members and vice versa. sets in this case is a pointer to the first union object in the array.

((uint16_t *)sets)[8] is not ok since it accesses the array inside the first union item out of bounds.

In order to get around that, one way would have to make changes on the caller side:

union another_union
{
  union Set sets[SIZE];
  uint16_t  u16 [8*SIZE];
};

union another_union au = { .sets = {/*...*/} };

uint16_t value = au.u16[8];

This is well-defined behavior (again assuming there is no padding).

Alternatively you could iterate across the whole union Set sets[SIZE] character by character, since any item in C can be inspected byte per byte:

union Set sets[SIZE];

for(unsigned char* ptr = (unsigned char*)sets;
    ptr < (unsigned char*)sets + sizeof(sets);
    ptr += sizeof(uint16_t))
{
  uint16_t value = *(uint16_t*)ptr;
  /* do something with value */
}

Supposedly this should be well-defined too, assuming that the effective type of the address obtained is actually a uint16_t, which ought to be the case if there is no padding.

Almost all compilers can be configured to process a dialect which supports non-portable type punning constructs such as those exemplified in your code, despite the refusal of the Standards Committee to acknowledge the existence of such dialects. The Standard characterizes the constructs you want to use as "non-portable or erroneous", which should hardly be suprising if (as seems likely) they're intended for machines with specific integer storage layouts. Thus, support for such constructs is a quality-of-implementation outside the Standard's jurisdiction, but as noted almost all compilers can be configured to support them.

Note also that jumping through hoops to be compatible with "strict aliasing" dialects may end up creating needless compatibility problems even when using -fno-strict-aliasing. For example, clang for the ARM Cortex-M0, given:

#include <stdint.h>
union uq { uint32_t ww[4]; uint16_t hh[8]; };
uint32_t test1(void *ptr)
{
    uint16_t *p = ptr;
    return p[0] | ((uint32_t)p[1] << 16);
}
uint32_t test2(void *ptr)
{
    union uq *p = ptr;
    return p->hh[0] | ((uint32_t)p->hh[1] << 16);
}

will generate code for test1 that will, when using -fno-strict-aliasing, interchangeably accept (and correctly process) pointers to union uq object or a pointer to any two-element segment of a uint16_t[], but the second version is apt to fail--even when using -fno-strict-aliasing--if passed an address that is 16-bit aligned but not 32-bit aligned.

本文标签: cAccessing a single value in an array of unionstruct with an array memberStack Overflow