admin管理员组

文章数量:1277311

I am running an Azure DevOps self-hosted agent inside a container on Azure Kubernetes Service (AKS). The AKS node pool uses Standard_D16ds_v5 (16 vCPUs, 64 GB RAM, Ephemeral SSD).

Issue:

  • Even though only one ADO agent container runs on this node, CPU usage stays at 100%.
  • Android builds (Gradle-based) take a long time, despite the high-performance VM.
  • The issue persists even after restarting the pod or scaling up the nodes.

What I Checked:

  1. Kubernetes CPU Limits:
    • Initially set requests.cpu: "4" and limits.cpu: "12".
  2. Ephemeral Disk Usage:
    • Mounted /mnt to container for Gradle cache.
    • Set GRADLE_USER_HOME=/mnt/gradle_cache, but builds remain slow.

Questions:

  1. Why is my container consuming 100% CPU, despite running on a high-performance VM?
  2. Could Kubernetes CPU scheduling (cgroups) be limiting performance?
  3. Is there a way to ensure the ADO agent and Gradle build utilize the ephemeral SSD optimally?
  4. Any best practices for optimizing Android builds in AKS?

Any insights would be greatly appreciated!

I am running an Azure DevOps self-hosted agent inside a container on Azure Kubernetes Service (AKS). The AKS node pool uses Standard_D16ds_v5 (16 vCPUs, 64 GB RAM, Ephemeral SSD).

Issue:

  • Even though only one ADO agent container runs on this node, CPU usage stays at 100%.
  • Android builds (Gradle-based) take a long time, despite the high-performance VM.
  • The issue persists even after restarting the pod or scaling up the nodes.

What I Checked:

  1. Kubernetes CPU Limits:
    • Initially set requests.cpu: "4" and limits.cpu: "12".
  2. Ephemeral Disk Usage:
    • Mounted /mnt to container for Gradle cache.
    • Set GRADLE_USER_HOME=/mnt/gradle_cache, but builds remain slow.

Questions:

  1. Why is my container consuming 100% CPU, despite running on a high-performance VM?
  2. Could Kubernetes CPU scheduling (cgroups) be limiting performance?
  3. Is there a way to ensure the ADO agent and Gradle build utilize the ephemeral SSD optimally?
  4. Any best practices for optimizing Android builds in AKS?

Any insights would be greatly appreciated!

Share Improve this question edited Feb 25 at 2:36 Bright Ran-MSFT 14k1 gold badge12 silver badges27 bronze badges asked Feb 24 at 10:59 VowneeeVowneee 1,4811 gold badge24 silver badges69 bronze badges 1
  • How is the CPU usage when you do not run any pipeline jobs in the agent container, or stop the agent container? – Bright Ran-MSFT Commented Feb 25 at 3:05
Add a comment  | 

1 Answer 1

Reset to default 0

There are many ways can cause High CPU usage in AKS clusters, however, the most causes could be related to user configuration.

You can follow the main steps below to troubleshoot the High CPU usage in AKS clusters:

  1. Use the Container Insights feature of AKS to identify nodes/containers with high CPU usage.

  2. Consider implementing any of the following best practices for avoiding high CPU usage:

    • Set appropriate limits for containers: It is recommended setting appropriate requests and limits to choose the appropriate Kubernetes Quality of Service (QoS) class for each pod.
    • Enable Horizontal Pod Autoscaler (HPA): Setting appropriate limits along with enabling HPA can help in resolving high CPU usage.
    • Select higher SKU: Use higher SKU to handle high CPU workloads.
    • Isolate system and user workloads: It is recommended creating a separate node pool (other than the agentpool) to run your workloads to prevent overloading the system node pool and provide better performance.

For more details, you can refer to the documentation "Troubleshoot high CPU usage in AKS clusters".


本文标签: gradleAzure DevOps Agent on AKS (D16dsv5) Shows 100 CPU Usage and Slow Android BuildsStack Overflow