admin管理员组文章数量:1406333
Assuming we have just 1 cpu core and for the sake of example given the following CAS loop (in Java language, took it from here), although the question is about CAS loop in general not this code in particular:
private AtomicInteger count = new AtomicInteger(0);
public void increment() {
int current, next;
do {
current = count.get();
next = current + 1;
} while (!countpareAndSet(current, next));
}
theoretically, nothing prevents some particular thread from getting stuck in this loop.
For example, after next = current + 1;
context switch occurs, some other thread changes the value of the atomic count
. Then once this unlucky thread resumes, the expression in the while statement evaluates to true. So, the loop starts again. Yet, after next = current + 1;
context switch occurs and things keep repeating so on and on.
So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so often? By safety here I mean some thread won't get stuck in the loop forever. For example, in a mission-critical app (whatever that means), just to emphasize zero tolerance towards unhappy scenarios. If so, how do we know CPU works like that? How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch? In fact, that is the main question that I would like to get an answer for in this scenario.
However if it is not because of that low-risk bet, then what makes us believe the loop will eventually terminate and when? For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?
Assuming we have just 1 cpu core and for the sake of example given the following CAS loop (in Java language, took it from here), although the question is about CAS loop in general not this code in particular:
private AtomicInteger count = new AtomicInteger(0);
public void increment() {
int current, next;
do {
current = count.get();
next = current + 1;
} while (!countpareAndSet(current, next));
}
theoretically, nothing prevents some particular thread from getting stuck in this loop.
For example, after next = current + 1;
context switch occurs, some other thread changes the value of the atomic count
. Then once this unlucky thread resumes, the expression in the while statement evaluates to true. So, the loop starts again. Yet, after next = current + 1;
context switch occurs and things keep repeating so on and on.
So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so often? By safety here I mean some thread won't get stuck in the loop forever. For example, in a mission-critical app (whatever that means), just to emphasize zero tolerance towards unhappy scenarios. If so, how do we know CPU works like that? How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch? In fact, that is the main question that I would like to get an answer for in this scenario.
However if it is not because of that low-risk bet, then what makes us believe the loop will eventually terminate and when? For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?
Share Improve this question edited Mar 6 at 0:35 Peter Cordes 368k49 gold badges717 silver badges981 bronze badges asked Mar 5 at 22:37 Turkhan BadalovTurkhan Badalov 9341 gold badge11 silver badges19 bronze badges 10 | Show 5 more comments2 Answers
Reset to default 3Most modern systems are SMP (multi-core and sometimes even multi-socket), so other threads can be running simultaneously with this one. That means CAS can fail even without a context switch by the core running your thread. Actual context-switches happen extremely infrequently compared to how long a CAS retry loop takes, so that's pretty much a non-problem.
See Is incrementing an int effectively atomic in specific cases? for more about how real CPUs handle atomic RMWs, especially on x86.
what makes us believe the loop will eventually terminate and when?
A CAS retry loop is lock-free: at least one thread will make progress every time they all do an iteration. (Of course they don't actually run in lock-step, and a CAS attempt can only happen while holding exclusive ownership of the cache line... See Anything in std::atomic is wait-free? for another take on that.)
Java and other languages don't provide fairness guarantees for their lock-free atomics, but in practice most hardware does try to avoid starving any core of access to a cache line it's waiting for. Its CAS attempt could still fail, but you'd have to be infinitely unlucky for it to fail an infinite number of times in a row with other threads winning the race to do the CAS.
But that's assuming all threads are doing a similar-speed computation between the load and CAS attempt; if many other threads are doing non-stop increments while you're trying to atomically replace x
with slow_function(x)
, you might never succeed.
If there's so much contention that your CAS retry loops often retry more than once, that's not good; lock-free works best when contention is low enough that CAS retries aren't common. So for example, you want to avoid having every thread contending to increment a single shared counter in a tight loop if you can avoid it. Break the work up into regions that are divided by smaller pools of threads, or have each thread claim 16 chunks by doing an atomic += 16
instead of += 1
.
But even with high contention, a loop like this tends to degrade fairly gracefully, not completely fall on its face as contention increases. Things would have to get very extreme before you'd start seeing hundreds of CAS failures in a row. Once a core gets ownership of a cache line, it only takes a few nanoseconds (tens of clock cycles) to make a CAS attempt, and latency between cores to move cache lines around is like 50 to 100 nanoseconds on typical CPUs. (The more cores, the more hops in the interconnect.)
On LL/SC machines, CAS itself and other atomic RMWs like getAndIncrement
require a retry loop to avoid spurious failures (which is why C++11 compare_exchange_weak
exists, a version that's allowed to fail spuriously and thus can be used in retry loops). Livelock with no threads making progress is possible in theory; avoiding that is I think up to CPU architects having cores hang on to cache-line ownership a bit longer, perhaps adaptively noticing that they've failed repeatedly.
Or better, architects providing single-instruction atomic RMWs as an alternative to LL/SC, like ARM did with ARMv8.1. And preferably a rich set of atomic RMWs to directly support methods like getAndIncrement
without a CAS retry loop. (CAS retry loops are still often needed, like when some data in an object you're publishing needs to be stored before you CAS a reference to it, or to implement a hypothetical getAndRightShift
or whatever.) See https://www.anandtech/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/2 for benchmarks of core-to-core round-trip latency averages using LDREX/STREX vs. ARMv8.1 single-instruction CAS on a 64-core system.
For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?
There is none.
So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so fast?
It is important to distinguish between "safe" in the sense that behavior is well defined under the relevant circumstances, and "safe" in the sense that the program can be relied upon to exhibit satisfactory characteristics. Atomic CAS is safe in the former sense, but algorithms using atomic CAS are not necessarily safe in the latter sense. For instance, they don't necessarily guarantee that all threads involved make progress in finite time.
Your example is not safe in the second sense. The use of atomic CAS does not avoid the possibility of one or more threads getting stuck in the loop, unless probabilistically.
In that case,
What exactly makes Compare-and-swap (CAS) loop a better choice in highly concurrent environment?
Individual atomic operations are typically much faster than taking out a lock, performing several operations, and then releasing the lock again. That can make lock-free approaches based on atomic operations consume less wall time and / or accommodate higher concurrency than lock-based approaches, at the cost of possibly consuming more CPU time. Such lock-free approaches often depend on atomic CAS.
But good, correct lock-free algorithms are tricky to develop. Neither lock-free approaches in general nor atomic CAS in particular is a silver bullet. And although they may provide for shorter completion times when used correctly, they do not magically give you more resources to work with, and they do not automatically use your resources more efficiently than lock-based approaches do.
How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch?
Don't. At least not while writing userspace code. Such intuition is not anywhere among the top things you need bear in mind to write good concurrent code, with or without locks. Relying on such assumptions is likely to make your code brittle.
what makes us believe the loop will eventually terminate and when?
To the limited extent that we do believe that, it is based on an assumption that either the timing will eventually work out, or that the other threads contending for the same resources will eventually run out of work to do. How reliable those assumptions are depends on many factors, among them the nature of the tasks, their number, and the number of concurrent execution units available to run them.
For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?
What makes you think there would be any such thing in the general case?
本文标签:
版权声明:本文标题:java - What exactly makes Compare-and-swap (CAS) loop a better choice in highly concurrent environment? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745004575a2637181.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
again gets on hold by OS until it resumes
. A typical thread runs for many thousands of cycles, maybe even 100 million cycles, before the OS suspends it. A CAS does not run once and then the OS stops it, CAS is non-blocking. It just runs as fast as possible continuously. Thus it would require a really insane probability of bad luck for this loop to get stuck. Basically that's why I asked for a real use case, this one is too simple to actually get stuck or anything interesting. – markspace Commented Mar 6 at 0:57