admin管理员组

文章数量:1122846

I am performing a performance test on a microservice application built in .NET 8.0, and we have recently encountered an issue. When I set a CPU limit on our pods, they begin to restart as soon as the application reaches 20 transactions per second (TPS).

I have monitored the situation using Dynatrace and various kubectl commands to check CPU and memory utilization, and I confirmed that resource usage is not exceeding the configured 60% threshold—it's even staying below 40% before the pods restart.

Despite my thorough investigation into this issue, I have not been able to find a solution. Any insights or guidance on how to resolve this issue would be greatly appreciated!

Please note when I remove the CPU limit from the deployment file, PODS are scaling correctly and there is no restart.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image:latest
        resources:
          requests:
            memory: "1512Mi"
            cpu: "2"  # Request for CPU
          limits:
            memory: "2Gi"
            cpu: "4"  # Limit for CPU
        ports:
        - containerPort: 80

I am performing a performance test on a microservice application built in .NET 8.0, and we have recently encountered an issue. When I set a CPU limit on our pods, they begin to restart as soon as the application reaches 20 transactions per second (TPS).

I have monitored the situation using Dynatrace and various kubectl commands to check CPU and memory utilization, and I confirmed that resource usage is not exceeding the configured 60% threshold—it's even staying below 40% before the pods restart.

Despite my thorough investigation into this issue, I have not been able to find a solution. Any insights or guidance on how to resolve this issue would be greatly appreciated!

Please note when I remove the CPU limit from the deployment file, PODS are scaling correctly and there is no restart.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image:latest
        resources:
          requests:
            memory: "1512Mi"
            cpu: "2"  # Request for CPU
          limits:
            memory: "2Gi"
            cpu: "4"  # Limit for CPU
        ports:
        - containerPort: 80

Share Improve this question edited Nov 26, 2024 at 6:03 Mysterious288 asked Nov 22, 2024 at 18:47 Mysterious288Mysterious288 4271 gold badge11 silver badges33 bronze badges 8
  • 1 Unlike memory, pods aren't supposed to restart even when CPU limits are reached, so I assume the application itself might be crashing when it doesn't have any more resources left to consume due to an application level issue? – M B Commented Nov 23, 2024 at 6:12
  • @MB Like I said the application works when there is no limit set, and I don't see any application level exception logs, let me know where to check – Mysterious288 Commented Nov 26, 2024 at 5:59
  • Not an answer, but could help with troubleshooting. Run kubectl describe <NAME OF POD>. It should show the last exit code of the last time it crashed and related events. – Jason Snouffer Commented Nov 29, 2024 at 0:08
  • Also, the scraping of metrics (CPU / memory) of containers happens periodically (this is configurable, but could be every 15s to 1m). The metrics scraper can miss short-lived CPU spikes. It is possible that CPU usage temporarily hits the limit and not show up in the scraped metrics. – Jason Snouffer Commented Nov 29, 2024 at 0:13
  • What does the pod log show? What does the kubernetes event log show? Does the application depend on any downstream services like DB, redis, or other web services? How does the request that causes the failure look? Does it crash mid-transfer? or does it crash before the first byte is returned? For the request that fails, what do you expect to get back? does the same request work in other cases? What does your performance test look like ? Does it scale requests per second ? Concurrency ? request complexity? Does it crash at different levels if you change the "shape" of your performance test? – David Thornton Commented Dec 3, 2024 at 19:38
 |  Show 3 more comments

1 Answer 1

Reset to default -1

You can run out of more resources than just CPU. Pick a node, any node that is exhibiting the behavior. Profile for the highest cost request in terms of total response time. Take a look at the variance (standard deviation) of the requests. The cost plus the variance combined are prime indicators of being bound on a member of the finite resource pool.

Once you understand the highest cost item that is most likely the root of the requests locking up resources that other items cannot (or have to wait to) access, then pull in the deep diagnostic superhero tool (Dynatrace) to drill baby drill on that request to profile all of the calls for the call with the highest cost and variance. This is likely your root problem. Optimize that!

本文标签: dockerKubernetes Pods restarting instead of scaling while performing performance executionStack Overflow