admin管理员组文章数量:1391995
I am running ActiveMQ Artemis on Kubernetes and trying to configure high availability (HA) with shared storage. However, I am facing an issue where the primary pod goes into a restart loop after enabling the shared store HA policy.
My question is an extension of this one, as I am experiencing the same issue but have also experimented with an alternative setup.
What I Tried
Configured HA with shared store:
Primary Pod
<ha-policy>
<shared-store>
<primary>
<failover-on-shutdown>true</failover-on-shutdown>
</primary>
</shared-store>
</ha-policy>
Secondary Pod
<ha-policy>
<shared-store>
<backup>
<allow-failback>false</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
</backup>
</shared-store>
</ha-policy>
Observed Issue:
ERROR [.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock
What change I tried:
Tested Running without HA Policy but in a Clustered Mode:
- Instead of defining an HA policy, I simply booted two clustered Artemis nodes using the same PVC (Persistent Volume Claim) for data storage.
- Behavior Observed:
- One pod becomes active while the other becomes passive.
- This resembles an active-passive setup, even though no HA policy is explicitly defined.
Questions:
- Why does the shared store HA setup cause the "Lost NodeManager lock" error, but a simple clustered setup with shared storage works fine?
- If I continue using a clustered setup without an HA policy but with shared storage, is this an acceptable and recommended approach?
- What are the risks of running a clustered ActiveMQ Artemis setup with shared storage but without an HA policy?
I am running ActiveMQ Artemis on Kubernetes and trying to configure high availability (HA) with shared storage. However, I am facing an issue where the primary pod goes into a restart loop after enabling the shared store HA policy.
My question is an extension of this one, as I am experiencing the same issue but have also experimented with an alternative setup.
What I Tried
Configured HA with shared store:
Primary Pod
<ha-policy>
<shared-store>
<primary>
<failover-on-shutdown>true</failover-on-shutdown>
</primary>
</shared-store>
</ha-policy>
Secondary Pod
<ha-policy>
<shared-store>
<backup>
<allow-failback>false</allow-failback>
<failover-on-shutdown>true</failover-on-shutdown>
</backup>
</shared-store>
</ha-policy>
Observed Issue:
ERROR [.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock
What change I tried:
Tested Running without HA Policy but in a Clustered Mode:
- Instead of defining an HA policy, I simply booted two clustered Artemis nodes using the same PVC (Persistent Volume Claim) for data storage.
- Behavior Observed:
- One pod becomes active while the other becomes passive.
- This resembles an active-passive setup, even though no HA policy is explicitly defined.
Questions:
- Why does the shared store HA setup cause the "Lost NodeManager lock" error, but a simple clustered setup with shared storage works fine?
- If I continue using a clustered setup without an HA policy but with shared storage, is this an acceptable and recommended approach?
- What are the risks of running a clustered ActiveMQ Artemis setup with shared storage but without an HA policy?
1 Answer
Reset to default 1You see "Lost NodeManager lock" when using a shared-store
ha-policy
because that configuration causes the broker to actively monitor the shared file lock while the broker is running.
Without a shared-store
ha-policy
your primary broker might lose the shared file lock without realizing it in which case the backup would activate and both the primary and the backup would be operating simultaneously (i.e. split brain). Therefore, I would not recommend a simple clustered setting using shared storage without a shared-store
ha-policy
.
I recommend you inspect the configuration and features of the shared storage device to ensure it is able to support exclusive shared file locks. I also recommend you monitor the shared storage device to ensure there are no intermittent problems that would cause the primary broker to lose its lock.
You can enable TRACE
logging for .apache.activemq.artemis.core.server.impl.FileLockNodeManager
to help you identify why the primary broker is losing its shared file lock.
本文标签: kubernetesActiveMQ Artemis Primary Pod Restart Loop with Shared Store HAStack Overflow
版权声明:本文标题:kubernetes - ActiveMQ Artemis: Primary Pod Restart Loop with Shared Store HA - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744653527a2617827.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论