cassandra - Writes fail when lightweight transactions cannot reach quorum - Stack Overflow

IT技术

更新时间：2025-03-162

admin管理员组
文章数量:1324856

In three node Cassandra cluster I am consistently facing the same kind of fatal situation on tables that are solely written using Cassandra's lightweight transactions (CAS).

Whenever a lightweight transaction fails to reach quorum (1/2), e.g. due to high load, any following attempt to write data within a transactions fails, i.e. does not return "[applied]"=true.

Using select * from system.paxos where cf_id=<id of table>, I see that there are entries, which I assume to be pending transactions.

Further, in /var/log/Cassandra/system.log I see logs like:

INFO  [ScheduledTasks:1] 2025-01-12 21:46:53,005 UncommittedTableData.java:567 - \
  Scheduling uncommitted paxos data merge task for `<any other table>

INFO  [OptionalTasks:1] 2025-01-12 21:46:53,006 PaxosCleanupLocalCoordinator.java:89 - \
  Completing uncommitted paxos instances for <table in stalled state> on ranges

However, I can't figure how to resolve the state nodetool repair -full <keyspace> (and variations), as well as restarting all nodes did not resolve the issue.

Further information:

Cassandra version: 4.1.5
replication strategy: SimpleStrategy
replication factor: 3

In three node Cassandra cluster I am consistently facing the same kind of fatal situation on tables that are solely written using Cassandra's lightweight transactions (CAS).

Whenever a lightweight transaction fails to reach quorum (1/2), e.g. due to high load, any following attempt to write data within a transactions fails, i.e. does not return "[applied]"=true.

Using select * from system.paxos where cf_id=<id of table>, I see that there are entries, which I assume to be pending transactions.

Further, in /var/log/Cassandra/system.log I see logs like:

INFO  [ScheduledTasks:1] 2025-01-12 21:46:53,005 UncommittedTableData.java:567 - \
  Scheduling uncommitted paxos data merge task for `<any other table>

INFO  [OptionalTasks:1] 2025-01-12 21:46:53,006 PaxosCleanupLocalCoordinator.java:89 - \
  Completing uncommitted paxos instances for <table in stalled state> on ranges

However, I can't figure how to resolve the state nodetool repair -full <keyspace> (and variations), as well as restarting all nodes did not resolve the issue.

Further information:

Cassandra version: 4.1.5
replication strategy: SimpleStrategy
replication factor: 3

Share Improve this question edited Jan 24 at 4:56 Erick Ramirez 16.4k2 gold badges21 silver badges31 bronze badges asked Jan 13 at 7:10 PeMa 1,71620 silver badges49 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Lightweight transactions (LWTs) are expensive operations since they require a read-before-write, meaning the data must be read to verify the conditional IF in the statement before the write is executed.

Prior to Paxos v2 added in Cassandra 4.1 (CASSANDRA-17164), LWTs required four round-trips for the [extended] Paxos phases: prepare/promise, serial read, propose/accept, commit. As a result, LWTs add significantly more load than regular writes. As such, if nodes are overloaded then it is expected for LWTs to perform even worse and not reach a quorum of replicas.

Running a repair does not solve the underlying issue with the nodes being overloaded. In fact, repairs add even more load like adding more fuel to a cluster that's on fire.

You should address the root cause of the problem. I recommend that you review the capacity of your cluster and analyse the utilisation of resources like disk, CPU and memory. It may be necessary for you to consider adding more nodes. Cheers!

本文标签： cassandraWrites fail when lightweight transactions cannot reach quorumStack Overflow

版权声明：本文标题：cassandra - Writes fail when lightweight transactions cannot reach quorum - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1742118311a2421574.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

cassandra - Writes fail when lightweight transactions cannot reach quorum - Stack Overflow

1 Answer 1

更多相关文章