admin管理员组

文章数量:1122846

I have a structure "Struct" with a field "foo" and a very large static "Struct" array.

What is the most efficient way of setting the field "foo" to 0 for the entire array?

Note that I don't care about the state of the other fields after this operation. If they get cleared, that's fine.

What I tried:

for(Struct& s : array) {
    s.foo = 0;
}

and

memset(array, 0, ARRAY_SIZE);

Are there more efficient methods?

Does the position of the count field in the layout of "Struct" matter?

I have a structure "Struct" with a field "foo" and a very large static "Struct" array.

What is the most efficient way of setting the field "foo" to 0 for the entire array?

Note that I don't care about the state of the other fields after this operation. If they get cleared, that's fine.

What I tried:

for(Struct& s : array) {
    s.foo = 0;
}

and

memset(array, 0, ARRAY_SIZE);

Are there more efficient methods?

Does the position of the count field in the layout of "Struct" matter?

Share Improve this question edited Nov 21, 2024 at 16:01 genpfault 52.1k12 gold badges91 silver badges147 bronze badges asked Nov 21, 2024 at 13:21 Emerick GibsonEmerick Gibson 455 bronze badges 16
  • 4 std::array<int,128> s{}; or struct s { int x; std::array<int,128> arrl }; s{}; – Pepijn Kramer Commented Nov 21, 2024 at 13:28
  • 2 You overthink this. When optimizations are enabled compiler is smart enough to generate exactly same code for all versions you have provided. Looks like you are beginner so learn first to write correct and readable code, then when you learn that start warring about performance by thinking about algorithm complexity. – Marek R Commented Nov 21, 2024 at 13:28
  • 2 memset will mess up objects of polymorphic type, but that also makes it likely the most performant. A type safe variant may be std::ranges::fill(array, {}); – StoryTeller - Unslander Monica Commented Nov 21, 2024 at 13:29
  • 3 std::fill would be the usual C++ solution (IMHO). – Jesper Juhl Commented Nov 21, 2024 at 13:32
  • 2 array = {}; as mentioned in the first comment is a simple solution. I suppose Marek calls your approaches "overkill" because you are using a complicated powerful tool (memset) to do something that can be done in a much simpler fashion. "overthinking" because, as they tried to explain, no difference in performance is to be expected for different variants of code when they have the same effective observable behavior. – 463035818_is_not_an_ai Commented Nov 21, 2024 at 13:47
 |  Show 11 more comments

2 Answers 2

Reset to default 3

Like usual, I would quote a classic:

The first rule of optimization is: Don’t do it.

The second rule of optimization (for experts only) is: Don’t do it yet

So, before writing something non-intuitive, like std::memset instead of foo = 0, at least try to benchmark your code and see if there is any significant difference. Assuming that your struct is something around

constexpr size_t PAD_SIZE = 4;
struct S {
  char dummy[PAD_SIZE];
  int foo;
  char dummy2[PAD_SIZE];
};

I wrote a simple benchmark:

constexpr size_t NUM_ELEM = 10000;

static void ClassicLoop(benchmark::State& state) {
  std::array<S, NUM_ELEM> testArr;
  for (auto _ : state) {
    for(auto& s : testArr) {
        s.foo = 0;
    }
    benchmark::DoNotOptimize(testArr);
  }
}
BENCHMARK(ClassicLoop);

static void WithMemset(benchmark::State& state) {
  std::array<S, NUM_ELEM> testArr;
  for (auto _ : state) {
    std::memset(testArr.data(), 0, NUM_ELEM*sizeof(S));
    benchmark::DoNotOptimize(testArr);
  }
}
BENCHMARK(WithMemset);

Which one is faster? Depends ... On my PC, if your struct is "small", then WithMemset is on average ~2% faster. If the struct is "big" (let's say PAD_SIZE=64;, then memset version can be twice as slow.

So, write idiomatic code from the beginning, and then only when you identify the bottleneck, try to optimize.

The short answer is that I'd usually use the first loop rather than memset. It should never be slower, and it might be faster.

The longer answer mostly deals with caching. Caches work in "lines". A line is a fixed-size chunk of data (e.g., 64 bytes on most modern Intel processors). As you're zeroing foo, the CPU will read in a cache line, write to what it needs to in that cache line, then write the whole cache line back out to main memory. In your case (doing that a lot) speed is quickly limited by the speed at which the CPU can read and write cache lines, so the amount you write inside of a cache line won't usually matter much.

If each struct is small enough to fit entirely in a single cache line, then you're going to be writing to every cache line. It doesn't matter much whether you overwrite the entire struct or just foo.

But if each struct is large enough that it occupies more than one cache line (and foo fits in only one of those), then writing only to foo will avoid loading the cache lines for the rest of each struct. E.g., if one struct is 256 bytes, we can expect the memset to be about 4x slower than writing only to foo.

CPUs also try to predict what memory you're doing to need next. So using memory in predictable patterns can gain quite a bit of speed. You're walking through the array from beginning to end, which is about as predictable as it gets. So you're already "optimizing" in this respect.

If, however, your foos formed a linked list, and you only needed to zero the first 90% of them, when you traversed the list in order, chances are that it would be faster to zero all of them than to walk through them in an unpredictable order to avoid zeroing a few. For the latter case to work out, you'd need to be able to avoid zeroing a pretty substantial percentage of them (like at least half), not just 10%.

Putting foo first in the struct can help speed a tiny bit. But the difference is tiny even at best. Reading/writing main memory will usually be the bottleneck unless the array is really tiny.

本文标签: How to efficiently set a property to 0 for an entire array in cStack Overflow