kubernetes - ActiveMQ Artemis: Primary Pod Restart Loop with Shared Store HA - Stack Overflow

IT技术

更新时间：2025-04-151

admin管理员组
文章数量:1391995

I am running ActiveMQ Artemis on Kubernetes and trying to configure high availability (HA) with shared storage. However, I am facing an issue where the primary pod goes into a restart loop after enabling the shared store HA policy.

My question is an extension of this one, as I am experiencing the same issue but have also experimented with an alternative setup.

What I Tried

Configured HA with shared store:

Primary Pod

<ha-policy>
    <shared-store>
        <primary>
            <failover-on-shutdown>true</failover-on-shutdown>
        </primary>
    </shared-store>
</ha-policy>

Secondary Pod

<ha-policy>
    <shared-store>
        <backup>
            <allow-failback>false</allow-failback>
            <failover-on-shutdown>true</failover-on-shutdown>
        </backup>
    </shared-store>
</ha-policy>

Observed Issue:

ERROR [.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock

What change I tried:

Tested Running without HA Policy but in a Clustered Mode:

Instead of defining an HA policy, I simply booted two clustered Artemis nodes using the same PVC (Persistent Volume Claim) for data storage.
Behavior Observed:
- One pod becomes active while the other becomes passive.
- This resembles an active-passive setup, even though no HA policy is explicitly defined.

Questions:

Why does the shared store HA setup cause the "Lost NodeManager lock" error, but a simple clustered setup with shared storage works fine?
If I continue using a clustered setup without an HA policy but with shared storage, is this an acceptable and recommended approach?
What are the risks of running a clustered ActiveMQ Artemis setup with shared storage but without an HA policy?

I am running ActiveMQ Artemis on Kubernetes and trying to configure high availability (HA) with shared storage. However, I am facing an issue where the primary pod goes into a restart loop after enabling the shared store HA policy.

My question is an extension of this one, as I am experiencing the same issue but have also experimented with an alternative setup.

What I Tried

Configured HA with shared store:

Primary Pod

<ha-policy>
    <shared-store>
        <primary>
            <failover-on-shutdown>true</failover-on-shutdown>
        </primary>
    </shared-store>
</ha-policy>

Secondary Pod

<ha-policy>
    <shared-store>
        <backup>
            <allow-failback>false</allow-failback>
            <failover-on-shutdown>true</failover-on-shutdown>
        </backup>
    </shared-store>
</ha-policy>

Observed Issue:

ERROR [.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock

What change I tried:

Tested Running without HA Policy but in a Clustered Mode:

Instead of defining an HA policy, I simply booted two clustered Artemis nodes using the same PVC (Persistent Volume Claim) for data storage.
Behavior Observed:
- One pod becomes active while the other becomes passive.
- This resembles an active-passive setup, even though no HA policy is explicitly defined.

Questions:

Why does the shared store HA setup cause the "Lost NodeManager lock" error, but a simple clustered setup with shared storage works fine?
If I continue using a clustered setup without an HA policy but with shared storage, is this an acceptable and recommended approach?
What are the risks of running a clustered ActiveMQ Artemis setup with shared storage but without an HA policy?

Share Improve this question edited Mar 14 at 15:04 Justin Bertram 35.5k6 gold badges26 silver badges49 bronze badges asked Mar 14 at 13:38 Subhidh Agarwal 1854 silver badges15 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You see "Lost NodeManager lock" when using a shared-store ha-policy because that configuration causes the broker to actively monitor the shared file lock while the broker is running.

Without a shared-store ha-policy your primary broker might lose the shared file lock without realizing it in which case the backup would activate and both the primary and the backup would be operating simultaneously (i.e. split brain). Therefore, I would not recommend a simple clustered setting using shared storage without a shared-store ha-policy.

I recommend you inspect the configuration and features of the shared storage device to ensure it is able to support exclusive shared file locks. I also recommend you monitor the shared storage device to ensure there are no intermittent problems that would cause the primary broker to lose its lock.

You can enable TRACE logging for .apache.activemq.artemis.core.server.impl.FileLockNodeManager to help you identify why the primary broker is losing its shared file lock.

本文标签： kubernetesActiveMQ Artemis Primary Pod Restart Loop with Shared Store HAStack Overflow

版权声明：本文标题：kubernetes - ActiveMQ Artemis: Primary Pod Restart Loop with Shared Store HA - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744653527a2617827.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

kubernetes - ActiveMQ Artemis: Primary Pod Restart Loop with Shared Store HA - Stack Overflow

1 Answer 1

更多相关文章