java - Apache Ignite node getting failed often - Stack Overflow

IT技术

更新时间：2025-01-088

admin管理员组
文章数量:1122846

I have two Apache Ignite servers with three clients already connected to them, each using separate data region configurations for their respective caches. These three clients work fine, but now, when I connect a fourth client, the node occasionally stops.

WARN 1 --- [vent-worker-#44] o.a.i.i.m.d.GridDiscoveryManager         : Node FAILED:
TcpDiscoveryNode [id=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3, consistentId=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, x.y.z.a, 127.0.0.1], sockAddrs=null, discPort=0, order=967, 
intOrder=487, lastExchangeTime=1731680665767, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true]

Whenever I get this error , my entire Spring boot application is getting restarted from the beginning. Why is this happening and how can I avoid this. Below is my configuration


@Configuration
public class IgniteConfig {

    @Bean
    public Ignite igniteInstance() {
        IgniteConfiguration cfg = new IgniteConfiguration();

        cfg.setMetricsLogFrequency(0);


        // Set client mode
        cfg.setClientMode(true);

        // Configure discovery SPI
        TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setAddresses(Arrays.asList(
                "x.y.z.a:47500..47509"
    ));
        discoverySpi.setIpFinder(ipFinder);
        discoverySpi.setNetworkTimeout(10000); // Network timeout (5 seconds)
        discoverySpi.setJoinTimeout(10000);   // Join timeout (10 seconds)
        cfg.setDiscoverySpi(discoverySpi);

        // Set failure detection timeouts
        cfg.setFailureDetectionTimeout(120000); // 120 seconds
        cfg.setClientFailureDetectionTimeout(120000); // 120 seconds

        // Configure TCP communication SPI
        TcpCommunicationSpi spi = new TcpCommunicationSpi();
        spi.setConnectTimeout(30000); // Initial connection timeout (3 seconds)
        spi.setMaxConnectTimeout(10000); // Max connection timeout (6 seconds)
        spi.setReconnectCount(3); // Number of reconnection attempts
        spi.setIdleConnectionTimeout(3000); // Idle connection timeout (100 ms)
        cfg.setCommunicationSpi(spi);

        // Configure event logging to capture node failures and disconnections
        cfg.setIncludeEventTypes(
                EventType.EVT_NODE_FAILED,
                EventType.EVT_NODE_LEFT,
                EventType.EVT_NODE_JOINED,
                EventType.EVT_NODE_SEGMENTED
        );

        // Configure event storage for diagnostics
        MemoryEventStorageSpi eventStorageSpi = new MemoryEventStorageSpi();
        eventStorageSpi.setExpireCount(1000); // Store up to 1000 events in memory
        cfg.setEventStorageSpi(eventStorageSpi);

        // Set metrics log frequency to zero to reduce logging noise
        cfg.setMetricsLogFrequency(0);

        // Start the Ignite instance
        return Ignition.start(cfg);
    }
}

I have two Apache Ignite servers with three clients already connected to them, each using separate data region configurations for their respective caches. These three clients work fine, but now, when I connect a fourth client, the node occasionally stops.

WARN 1 --- [vent-worker-#44] o.a.i.i.m.d.GridDiscoveryManager         : Node FAILED:
TcpDiscoveryNode [id=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3, consistentId=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, x.y.z.a, 127.0.0.1], sockAddrs=null, discPort=0, order=967, 
intOrder=487, lastExchangeTime=1731680665767, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true]

Whenever I get this error , my entire Spring boot application is getting restarted from the beginning. Why is this happening and how can I avoid this. Below is my configuration


@Configuration
public class IgniteConfig {

    @Bean
    public Ignite igniteInstance() {
        IgniteConfiguration cfg = new IgniteConfiguration();

        cfg.setMetricsLogFrequency(0);


        // Set client mode
        cfg.setClientMode(true);

        // Configure discovery SPI
        TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setAddresses(Arrays.asList(
                "x.y.z.a:47500..47509"
    ));
        discoverySpi.setIpFinder(ipFinder);
        discoverySpi.setNetworkTimeout(10000); // Network timeout (5 seconds)
        discoverySpi.setJoinTimeout(10000);   // Join timeout (10 seconds)
        cfg.setDiscoverySpi(discoverySpi);

        // Set failure detection timeouts
        cfg.setFailureDetectionTimeout(120000); // 120 seconds
        cfg.setClientFailureDetectionTimeout(120000); // 120 seconds

        // Configure TCP communication SPI
        TcpCommunicationSpi spi = new TcpCommunicationSpi();
        spi.setConnectTimeout(30000); // Initial connection timeout (3 seconds)
        spi.setMaxConnectTimeout(10000); // Max connection timeout (6 seconds)
        spi.setReconnectCount(3); // Number of reconnection attempts
        spi.setIdleConnectionTimeout(3000); // Idle connection timeout (100 ms)
        cfg.setCommunicationSpi(spi);

        // Configure event logging to capture node failures and disconnections
        cfg.setIncludeEventTypes(
                EventType.EVT_NODE_FAILED,
                EventType.EVT_NODE_LEFT,
                EventType.EVT_NODE_JOINED,
                EventType.EVT_NODE_SEGMENTED
        );

        // Configure event storage for diagnostics
        MemoryEventStorageSpi eventStorageSpi = new MemoryEventStorageSpi();
        eventStorageSpi.setExpireCount(1000); // Store up to 1000 events in memory
        cfg.setEventStorageSpi(eventStorageSpi);

        // Set metrics log frequency to zero to reduce logging noise
        cfg.setMetricsLogFrequency(0);

        // Start the Ignite instance
        return Ignition.start(cfg);
    }
}

Share Improve this question edited Nov 21, 2024 at 10:41 asked Nov 21, 2024 at 10:30 Dude Ramasamy 256 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Firstly, Why are you using 3 separate data regions. It may very well be fine to do so, but it is pre-dividing your memory space which in and of itself is not any issue unless one of your applications needs to use more than its slice of the pie! If all three were in the same data region then you are only limited by total memory as all consumers draw from the 1 and only pie! In terms of node failure you would need to look at the log file to try to see if there are indications of failure there. I have seen long GC pauses ultimately end up causing a node to crash. I can't say that is your issue, but if your log showed long GC pauses before a crash that you be one example of a node failure reason. Hope that helps.

本文标签： javaApache Ignite node getting failed oftenStack Overflow

版权声明：本文标题：java - Apache Ignite node getting failed often - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736311760a1934853.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

java - Apache Ignite node getting failed often - Stack Overflow

1 Answer 1

更多相关文章