admin管理员组

文章数量:1406333

Assuming we have just 1 cpu core and for the sake of example given the following CAS loop (in Java language, took it from here), although the question is about CAS loop in general not this code in particular:

private AtomicInteger count = new AtomicInteger(0);

public void increment() {
  int current, next;
  do {
    current = count.get();
    next = current + 1;
  } while (!countpareAndSet(current, next));
}

theoretically, nothing prevents some particular thread from getting stuck in this loop.

For example, after next = current + 1; context switch occurs, some other thread changes the value of the atomic count. Then once this unlucky thread resumes, the expression in the while statement evaluates to true. So, the loop starts again. Yet, after next = current + 1; context switch occurs and things keep repeating so on and on.

So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so often? By safety here I mean some thread won't get stuck in the loop forever. For example, in a mission-critical app (whatever that means), just to emphasize zero tolerance towards unhappy scenarios. If so, how do we know CPU works like that? How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch? In fact, that is the main question that I would like to get an answer for in this scenario.

However if it is not because of that low-risk bet, then what makes us believe the loop will eventually terminate and when? For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?

Assuming we have just 1 cpu core and for the sake of example given the following CAS loop (in Java language, took it from here), although the question is about CAS loop in general not this code in particular:

private AtomicInteger count = new AtomicInteger(0);

public void increment() {
  int current, next;
  do {
    current = count.get();
    next = current + 1;
  } while (!countpareAndSet(current, next));
}

theoretically, nothing prevents some particular thread from getting stuck in this loop.

For example, after next = current + 1; context switch occurs, some other thread changes the value of the atomic count. Then once this unlucky thread resumes, the expression in the while statement evaluates to true. So, the loop starts again. Yet, after next = current + 1; context switch occurs and things keep repeating so on and on.

So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so often? By safety here I mean some thread won't get stuck in the loop forever. For example, in a mission-critical app (whatever that means), just to emphasize zero tolerance towards unhappy scenarios. If so, how do we know CPU works like that? How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch? In fact, that is the main question that I would like to get an answer for in this scenario.

However if it is not because of that low-risk bet, then what makes us believe the loop will eventually terminate and when? For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?

Share Improve this question edited Mar 6 at 0:35 Peter Cordes 368k49 gold badges717 silver badges981 bronze badges asked Mar 5 at 22:37 Turkhan BadalovTurkhan Badalov 9341 gold badge11 silver badges19 bronze badges 10
  • 1 Basically what you have there is a poorly designed method. Compare and swap isn't magic, you still have to define your algorithm so that it does in fact work. I think you should expand your example to something that reproduces your use case. Or otherwise improve your problem description. I think the correct way of looking at this is that one thread must succeed, so no thread loops more times than the number of threads currently trying to update that memory. That's a very high rate of concurrency, not a "low risk bet." The "low risk" case is always free and immediate. – markspace Commented Mar 5 at 23:07
  • @markspace, thanks for the comment. Regarding the "no thread loops more times than the number of threads currently trying to update that memory", I think nothing guarantees how long that "unlucky" thread will wait for its turn unless OS provides some level of fairness I suppose? Meanwhile, some other threads might update the counter many times since the "unlucky" one was put on hold. Then, once it runs it sees the value has changed and again gets on hold by OS until it resumes next time just to realize the value has changed again. Or there are some guarantees after all? – Turkhan Badalov Commented Mar 5 at 23:16
  • @markspace, "I think you should expand your example to something that reproduces your use case". I don't really have a particular case. I was reading about CAS loops and how they are better compared to locks in a concurrent system. So, wanted to know how much can I trust this loop? Well, given many systems use this approach, it must be safe anyway. For example, project reactor alone uses a lot of CAS loops (that is also why I dived in this topic). It is just me failing to see what exactly prevents the worst-case scenario happening or can we just accept it? Basically, where is normal here? :) – Turkhan Badalov Commented Mar 5 at 23:19
  • 2 I think John gave an excellent answer, I want to address one point: again gets on hold by OS until it resumes. A typical thread runs for many thousands of cycles, maybe even 100 million cycles, before the OS suspends it. A CAS does not run once and then the OS stops it, CAS is non-blocking. It just runs as fast as possible continuously. Thus it would require a really insane probability of bad luck for this loop to get stuck. Basically that's why I asked for a real use case, this one is too simple to actually get stuck or anything interesting. – markspace Commented Mar 6 at 0:57
  • 1 @TurkhanBadalov, In some programs, the possibility of the CAS failing is not decided by a strictly random process. In some programs, threads can unintentionally become synchronized with each other in a way that makes the conflict much more likely. The extreme case, where the CAS frequently fails, and always for the same thread, is a kind of starvation. – Solomon Slow Commented Mar 7 at 13:27
 |  Show 5 more comments

2 Answers 2

Reset to default 3

Most modern systems are SMP (multi-core and sometimes even multi-socket), so other threads can be running simultaneously with this one. That means CAS can fail even without a context switch by the core running your thread. Actual context-switches happen extremely infrequently compared to how long a CAS retry loop takes, so that's pretty much a non-problem.

See Is incrementing an int effectively atomic in specific cases? for more about how real CPUs handle atomic RMWs, especially on x86.

what makes us believe the loop will eventually terminate and when?

A CAS retry loop is lock-free: at least one thread will make progress every time they all do an iteration. (Of course they don't actually run in lock-step, and a CAS attempt can only happen while holding exclusive ownership of the cache line... See Anything in std::atomic is wait-free? for another take on that.)

Java and other languages don't provide fairness guarantees for their lock-free atomics, but in practice most hardware does try to avoid starving any core of access to a cache line it's waiting for. Its CAS attempt could still fail, but you'd have to be infinitely unlucky for it to fail an infinite number of times in a row with other threads winning the race to do the CAS.

But that's assuming all threads are doing a similar-speed computation between the load and CAS attempt; if many other threads are doing non-stop increments while you're trying to atomically replace x with slow_function(x), you might never succeed.

If there's so much contention that your CAS retry loops often retry more than once, that's not good; lock-free works best when contention is low enough that CAS retries aren't common. So for example, you want to avoid having every thread contending to increment a single shared counter in a tight loop if you can avoid it. Break the work up into regions that are divided by smaller pools of threads, or have each thread claim 16 chunks by doing an atomic += 16 instead of += 1.

But even with high contention, a loop like this tends to degrade fairly gracefully, not completely fall on its face as contention increases. Things would have to get very extreme before you'd start seeing hundreds of CAS failures in a row. Once a core gets ownership of a cache line, it only takes a few nanoseconds (tens of clock cycles) to make a CAS attempt, and latency between cores to move cache lines around is like 50 to 100 nanoseconds on typical CPUs. (The more cores, the more hops in the interconnect.)


On LL/SC machines, CAS itself and other atomic RMWs like getAndIncrement require a retry loop to avoid spurious failures (which is why C++11 compare_exchange_weak exists, a version that's allowed to fail spuriously and thus can be used in retry loops). Livelock with no threads making progress is possible in theory; avoiding that is I think up to CPU architects having cores hang on to cache-line ownership a bit longer, perhaps adaptively noticing that they've failed repeatedly.

Or better, architects providing single-instruction atomic RMWs as an alternative to LL/SC, like ARM did with ARMv8.1. And preferably a rich set of atomic RMWs to directly support methods like getAndIncrement without a CAS retry loop. (CAS retry loops are still often needed, like when some data in an object you're publishing needs to be stored before you CAS a reference to it, or to implement a hypothetical getAndRightShift or whatever.) See https://www.anandtech/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/2 for benchmarks of core-to-core round-trip latency averages using LDREX/STREX vs. ARMv8.1 single-instruction CAS on a 64-core system.

For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?

There is none.

So, would it be correct to say that CAS loop is safe to use because it is just a zero-risk bet to expect CPU not to do context switch so fast?

It is important to distinguish between "safe" in the sense that behavior is well defined under the relevant circumstances, and "safe" in the sense that the program can be relied upon to exhibit satisfactory characteristics. Atomic CAS is safe in the former sense, but algorithms using atomic CAS are not necessarily safe in the latter sense. For instance, they don't necessarily guarantee that all threads involved make progress in finite time.

Your example is not safe in the second sense. The use of atomic CAS does not avoid the possibility of one or more threads getting stuck in the loop, unless probabilistically.

In that case,

What exactly makes Compare-and-swap (CAS) loop a better choice in highly concurrent environment?

Individual atomic operations are typically much faster than taking out a lock, performing several operations, and then releasing the lock again. That can make lock-free approaches based on atomic operations consume less wall time and / or accommodate higher concurrency than lock-based approaches, at the cost of possibly consuming more CPU time. Such lock-free approaches often depend on atomic CAS.

But good, correct lock-free algorithms are tricky to develop. Neither lock-free approaches in general nor atomic CAS in particular is a silver bullet. And although they may provide for shorter completion times when used correctly, they do not magically give you more resources to work with, and they do not automatically use your resources more efficiently than lock-based approaches do.

How to cultivate that intuition of how many operations roughly it usually takes before CPU performs a context switch?

Don't. At least not while writing userspace code. Such intuition is not anywhere among the top things you need bear in mind to write good concurrent code, with or without locks. Relying on such assumptions is likely to make your code brittle.

what makes us believe the loop will eventually terminate and when?

To the limited extent that we do believe that, it is based on an assumption that either the timing will eventually work out, or that the other threads contending for the same resources will eventually run out of work to do. How reliable those assumptions are depends on many factors, among them the nature of the tasks, their number, and the number of concurrent execution units available to run them.

For example, what would be the upper bound of the number of retries of the loop or are there some underlying (os-level) guarantees regarding that?

What makes you think there would be any such thing in the general case?

本文标签: