admin管理员组

文章数量:1279146

I have a Nomad on premise cluster with 3 servers and 3 clients. I am using Consul as the service discovery mechanism. Everything works fine when I use the default bridge network mode in my job definition, as shown below:

network {
  mode = "bridge"
  port "web" {
    to = 5000
  }
}

This correctly registers the service in Consul, and the health check works without issues. Additionally, I can access the services both from within the cluster and externally.

However, when I try to use a custom CNI configuration (cni/nomad-qa), the job is still registered in Consul, but:

  • The health check fails.

  • I cannot access the services from inside the cluster.

  • I cannot access the services from outside the cluster.

Here’s the modified job definition:

job "dummy" {
  datacenters = ["dc1"]

  group "dummy" {
    count = 5

    network {
      mode = "cni/nomad-qa"
      port "web" {
        to = 5000
      }
    }

    spread {
      attribute = "${meta.availability_zone}"
      weight    = 100

      target "us-east-1a" {
        percent = 80
      }
    }

    volume "dummy-store" {
      type      = "host"
      read_only = false
      source    = "test-volume-a"
    }

    service {
      name = "dummy"
      tags = ["urlprefix-/"]
      port = "web"

      check {
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
      }
    }

    restart {
      attempts = 2
      interval = "30m"
      delay    = "15s"
      mode     = "fail"
    }

    task "dummy" {
      driver = "docker"

      config {
        image = "dummy:flask"
        ports = ["web"]
        auth {
          server_address = "registry..."
          username       = "XXXX"
          password       = "gXXXX"
        }
      }

      volume_mount {
        volume      = "dummy-store"
        destination = "/temp"
        read_only   = false
      }

      resources {
        cpu    = 50
        memory = 256
      }
    }
  }
}

Custom CNI Configuration: /opt/cni/config/

nomad-net

{
    "cniVersion": "1.0.0",
    "name": "nomad-qa",
    "plugins": [
      {
        "type": "loopback"
      },
      {
        "type": "bridge",
        "bridge": "nomad-qa",
        "ipMasq": true,
        "isGateway": true,
        "forceAddress": true,
        "hairpinMode": true,
        "ipam": {
          "type": "host-local",
          "ranges": [
            [
              {
                "subnet": "172.21.0.0/20"
              }
            ]
          ],
          "routes": [
            { "dst": "0.0.0.0/0" }
          ]
        }
      },
      {
        "type": "firewall",
        "backend": "iptables",
        "iptablesAdminChainName": "NOMAD-QA"
      },
      {
        "type": "portmap",
        "capabilities": {"portMappings": true},
        "snat": true
      }
    ]
  }
  
  • The job gets registered in Consul, but:
  • The health check fails.
  • The service cannot be accessed from inside or outside the cluster.

I suspect the issue could be related to the network configuration in the custom CNI or how the ports are being mapped. The service appears in Consul but is unreachable.

Has anyone encountered a similar issue when using a custom CNI with Nomad? Any guidance on debugging or fixing this would be greatly appreciated!

Regards

本文标签: hashicorpIssue with Consul health check not working when using custom CNI with Nomad jobStack Overflow