admin管理员组文章数量:1122846
I have a structure "Struct" with a field "foo" and a very large static "Struct" array.
What is the most efficient way of setting the field "foo" to 0 for the entire array?
Note that I don't care about the state of the other fields after this operation. If they get cleared, that's fine.
What I tried:
for(Struct& s : array) {
s.foo = 0;
}
and
memset(array, 0, ARRAY_SIZE);
Are there more efficient methods?
Does the position of the count field in the layout of "Struct" matter?
I have a structure "Struct" with a field "foo" and a very large static "Struct" array.
What is the most efficient way of setting the field "foo" to 0 for the entire array?
Note that I don't care about the state of the other fields after this operation. If they get cleared, that's fine.
What I tried:
for(Struct& s : array) {
s.foo = 0;
}
and
memset(array, 0, ARRAY_SIZE);
Are there more efficient methods?
Does the position of the count field in the layout of "Struct" matter?
Share Improve this question edited Nov 21, 2024 at 16:01 genpfault 52.1k12 gold badges91 silver badges147 bronze badges asked Nov 21, 2024 at 13:21 Emerick GibsonEmerick Gibson 455 bronze badges 16 | Show 11 more comments2 Answers
Reset to default 3Like usual, I would quote a classic:
The first rule of optimization is: Don’t do it.
The second rule of optimization (for experts only) is: Don’t do it yet
So, before writing something non-intuitive, like std::memset
instead of foo = 0
, at least try to benchmark your code and see if there is any significant difference. Assuming that your struct is something around
constexpr size_t PAD_SIZE = 4;
struct S {
char dummy[PAD_SIZE];
int foo;
char dummy2[PAD_SIZE];
};
I wrote a simple benchmark:
constexpr size_t NUM_ELEM = 10000;
static void ClassicLoop(benchmark::State& state) {
std::array<S, NUM_ELEM> testArr;
for (auto _ : state) {
for(auto& s : testArr) {
s.foo = 0;
}
benchmark::DoNotOptimize(testArr);
}
}
BENCHMARK(ClassicLoop);
static void WithMemset(benchmark::State& state) {
std::array<S, NUM_ELEM> testArr;
for (auto _ : state) {
std::memset(testArr.data(), 0, NUM_ELEM*sizeof(S));
benchmark::DoNotOptimize(testArr);
}
}
BENCHMARK(WithMemset);
Which one is faster? Depends ... On my PC, if your struct is "small", then WithMemset
is on average ~2% faster. If the struct is "big" (let's say PAD_SIZE=64;
, then memset version can be twice as slow.
So, write idiomatic code from the beginning, and then only when you identify the bottleneck, try to optimize.
The short answer is that I'd usually use the first loop rather than memset
. It should never be slower, and it might be faster.
The longer answer mostly deals with caching. Caches work in "lines". A line is a fixed-size chunk of data (e.g., 64 bytes on most modern Intel processors). As you're zeroing foo
, the CPU will read in a cache line, write to what it needs to in that cache line, then write the whole cache line back out to main memory. In your case (doing that a lot) speed is quickly limited by the speed at which the CPU can read and write cache lines, so the amount you write inside of a cache line won't usually matter much.
If each struct is small enough to fit entirely in a single cache line, then you're going to be writing to every cache line. It doesn't matter much whether you overwrite the entire struct or just foo
.
But if each struct is large enough that it occupies more than one cache line (and foo
fits in only one of those), then writing only to foo
will avoid loading the cache lines for the rest of each struct. E.g., if one struct is 256 bytes, we can expect the memset
to be about 4x slower than writing only to foo
.
CPUs also try to predict what memory you're doing to need next. So using memory in predictable patterns can gain quite a bit of speed. You're walking through the array from beginning to end, which is about as predictable as it gets. So you're already "optimizing" in this respect.
If, however, your foo
s formed a linked list, and you only needed to zero the first 90% of them, when you traversed the list in order, chances are that it would be faster to zero all of them than to walk through them in an unpredictable order to avoid zeroing a few. For the latter case to work out, you'd need to be able to avoid zeroing a pretty substantial percentage of them (like at least half), not just 10%.
Putting foo
first in the struct can help speed a tiny bit. But the difference is tiny even at best. Reading/writing main memory will usually be the bottleneck unless the array is really tiny.
本文标签: How to efficiently set a property to 0 for an entire array in cStack Overflow
版权声明:本文标题:How to efficiently set a property to 0 for an entire array in c++ - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736310515a1934404.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
std::array<int,128> s{};
orstruct s { int x; std::array<int,128> arrl }; s{};
– Pepijn Kramer Commented Nov 21, 2024 at 13:28memset
will mess up objects of polymorphic type, but that also makes it likely the most performant. A type safe variant may bestd::ranges::fill(array, {});
– StoryTeller - Unslander Monica Commented Nov 21, 2024 at 13:29array = {};
as mentioned in the first comment is a simple solution. I suppose Marek calls your approaches "overkill" because you are using a complicated powerful tool (memset
) to do something that can be done in a much simpler fashion. "overthinking" because, as they tried to explain, no difference in performance is to be expected for different variants of code when they have the same effective observable behavior. – 463035818_is_not_an_ai Commented Nov 21, 2024 at 13:47