admin管理员组

文章数量:1122846

I have two Apache Ignite servers with three clients already connected to them, each using separate data region configurations for their respective caches. These three clients work fine, but now, when I connect a fourth client, the node occasionally stops.

WARN 1 --- [vent-worker-#44] o.a.i.i.m.d.GridDiscoveryManager         : Node FAILED:
TcpDiscoveryNode [id=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3, consistentId=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, x.y.z.a, 127.0.0.1], sockAddrs=null, discPort=0, order=967, 
intOrder=487, lastExchangeTime=1731680665767, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true]

Whenever I get this error , my entire Spring boot application is getting restarted from the beginning. Why is this happening and how can I avoid this. Below is my configuration


@Configuration
public class IgniteConfig {

    @Bean
    public Ignite igniteInstance() {
        IgniteConfiguration cfg = new IgniteConfiguration();

        cfg.setMetricsLogFrequency(0);


        // Set client mode
        cfg.setClientMode(true);

        // Configure discovery SPI
        TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setAddresses(Arrays.asList(
                "x.y.z.a:47500..47509"
    ));
        discoverySpi.setIpFinder(ipFinder);
        discoverySpi.setNetworkTimeout(10000); // Network timeout (5 seconds)
        discoverySpi.setJoinTimeout(10000);   // Join timeout (10 seconds)
        cfg.setDiscoverySpi(discoverySpi);

        // Set failure detection timeouts
        cfg.setFailureDetectionTimeout(120000); // 120 seconds
        cfg.setClientFailureDetectionTimeout(120000); // 120 seconds

        // Configure TCP communication SPI
        TcpCommunicationSpi spi = new TcpCommunicationSpi();
        spi.setConnectTimeout(30000); // Initial connection timeout (3 seconds)
        spi.setMaxConnectTimeout(10000); // Max connection timeout (6 seconds)
        spi.setReconnectCount(3); // Number of reconnection attempts
        spi.setIdleConnectionTimeout(3000); // Idle connection timeout (100 ms)
        cfg.setCommunicationSpi(spi);

        // Configure event logging to capture node failures and disconnections
        cfg.setIncludeEventTypes(
                EventType.EVT_NODE_FAILED,
                EventType.EVT_NODE_LEFT,
                EventType.EVT_NODE_JOINED,
                EventType.EVT_NODE_SEGMENTED
        );

        // Configure event storage for diagnostics
        MemoryEventStorageSpi eventStorageSpi = new MemoryEventStorageSpi();
        eventStorageSpi.setExpireCount(1000); // Store up to 1000 events in memory
        cfg.setEventStorageSpi(eventStorageSpi);

        // Set metrics log frequency to zero to reduce logging noise
        cfg.setMetricsLogFrequency(0);

        // Start the Ignite instance
        return Ignition.start(cfg);
    }
}

I have two Apache Ignite servers with three clients already connected to them, each using separate data region configurations for their respective caches. These three clients work fine, but now, when I connect a fourth client, the node occasionally stops.

WARN 1 --- [vent-worker-#44] o.a.i.i.m.d.GridDiscoveryManager         : Node FAILED:
TcpDiscoveryNode [id=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3, consistentId=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, x.y.z.a, 127.0.0.1], sockAddrs=null, discPort=0, order=967, 
intOrder=487, lastExchangeTime=1731680665767, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true]

Whenever I get this error , my entire Spring boot application is getting restarted from the beginning. Why is this happening and how can I avoid this. Below is my configuration


@Configuration
public class IgniteConfig {

    @Bean
    public Ignite igniteInstance() {
        IgniteConfiguration cfg = new IgniteConfiguration();

        cfg.setMetricsLogFrequency(0);


        // Set client mode
        cfg.setClientMode(true);

        // Configure discovery SPI
        TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setAddresses(Arrays.asList(
                "x.y.z.a:47500..47509"
    ));
        discoverySpi.setIpFinder(ipFinder);
        discoverySpi.setNetworkTimeout(10000); // Network timeout (5 seconds)
        discoverySpi.setJoinTimeout(10000);   // Join timeout (10 seconds)
        cfg.setDiscoverySpi(discoverySpi);

        // Set failure detection timeouts
        cfg.setFailureDetectionTimeout(120000); // 120 seconds
        cfg.setClientFailureDetectionTimeout(120000); // 120 seconds

        // Configure TCP communication SPI
        TcpCommunicationSpi spi = new TcpCommunicationSpi();
        spi.setConnectTimeout(30000); // Initial connection timeout (3 seconds)
        spi.setMaxConnectTimeout(10000); // Max connection timeout (6 seconds)
        spi.setReconnectCount(3); // Number of reconnection attempts
        spi.setIdleConnectionTimeout(3000); // Idle connection timeout (100 ms)
        cfg.setCommunicationSpi(spi);

        // Configure event logging to capture node failures and disconnections
        cfg.setIncludeEventTypes(
                EventType.EVT_NODE_FAILED,
                EventType.EVT_NODE_LEFT,
                EventType.EVT_NODE_JOINED,
                EventType.EVT_NODE_SEGMENTED
        );

        // Configure event storage for diagnostics
        MemoryEventStorageSpi eventStorageSpi = new MemoryEventStorageSpi();
        eventStorageSpi.setExpireCount(1000); // Store up to 1000 events in memory
        cfg.setEventStorageSpi(eventStorageSpi);

        // Set metrics log frequency to zero to reduce logging noise
        cfg.setMetricsLogFrequency(0);

        // Start the Ignite instance
        return Ignition.start(cfg);
    }
}
Share Improve this question edited Nov 21, 2024 at 10:41 Dude Ramasamy asked Nov 21, 2024 at 10:30 Dude RamasamyDude Ramasamy 256 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Firstly, Why are you using 3 separate data regions. It may very well be fine to do so, but it is pre-dividing your memory space which in and of itself is not any issue unless one of your applications needs to use more than its slice of the pie! If all three were in the same data region then you are only limited by total memory as all consumers draw from the 1 and only pie! In terms of node failure you would need to look at the log file to try to see if there are indications of failure there. I have seen long GC pauses ultimately end up causing a node to crash. I can't say that is your issue, but if your log showed long GC pauses before a crash that you be one example of a node failure reason. Hope that helps.

本文标签: javaApache Ignite node getting failed oftenStack Overflow