admin管理员组

文章数量:1315832

I believe I have some sort of logic issue with my auto-scaling module that is responsible for scaling my ECS tasks.

As you can see, I use dynamic scaling that's based on the resources consumption, scheduled scaling that is based on Cron hours, and SQS scaling based on queue length.

My main issue is with the dynamic scaling in combination with scheduled scaling.

For example, the customer has demanded that everyday at 8:00AM, we will pre-launch 50 instances of a specific service. Scheduled scaling works fine and we get 50 instances of said task.

The problem is that my dynamic scaling sees that these tasks are under-utilized and begins shutting them down, and about half an hour later we get overwhelmed by traffic and the services begin to crash.

I am attaching my module configuration, most values are set outside but it should give you an idea of how it works.

I'm just trying to better understand if I missed something, or if the whole logic I made is not correct.

Thanks ahead to anyone who's able to assist.

resource "aws_appautoscaling_target" "auto_scaling_target" {
  for_each = varpute_services_auto_scaling_configuration
  max_capacity       = each.value.instance_max_capacity
  min_capacity       = each.value.instance_min_capacity
  resource_id        = "service/${var.cluster_name}/${each.key}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "auto_scaling_memory_policy" {
  for_each = varpute_services_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_memory_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    target_value = each.value.memory_scaling_target_value
  }
}

resource "aws_appautoscaling_policy" "auto_scaling_cpu_policy" {
  for_each = varpute_services_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_cpu_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = each.value.cpu_scaling_target_value
  }
}

resource "aws_appautoscaling_policy" "auto_scaling_sqs_policy" {
  for_each = var.sqs_based_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_sqs_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    customized_metric_specification {
      metric_name = "ApproximateNumberOfMessagesVisible"
      namespace   = "AWS/SQS"
      statistic   = "Average"

      dimensions {
        name  = "QueueName"
        value = each.value.queue_name
      }
    }

    target_value        = each.value.sqs_scaling_target_value
    scale_in_cooldown   = 300
    scale_out_cooldown  = 300
  }
}

locals {
  flattened_scheduled_scaling_actions = flatten([
    for service, actions in varpute_services_scheduled_scaling : [
      for idx, action in actions : {
        action_name   = "${service}_scheduled_scaling_${idx + 1}"
        service_name  = service
        schedule      = action.schedule_expression
        desired_count = action.scalable_action.desired_count
      }
    ]
  ])
}

resource "aws_appautoscaling_scheduled_action" "scheduled_scaling" {
  for_each = { for action in local.flattened_scheduled_scaling_actions : action.action_name => action }

  service_namespace  = "ecs"
  resource_id        = "service/${var.cluster_name}/${each.value.service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  name               = each.value.action_name
  schedule           = each.value.schedule

  scalable_target_action {
    min_capacity = each.value.desired_count
    max_capacity = each.value.desired_count
  }
}

I believe I have some sort of logic issue with my auto-scaling module that is responsible for scaling my ECS tasks.

As you can see, I use dynamic scaling that's based on the resources consumption, scheduled scaling that is based on Cron hours, and SQS scaling based on queue length.

My main issue is with the dynamic scaling in combination with scheduled scaling.

For example, the customer has demanded that everyday at 8:00AM, we will pre-launch 50 instances of a specific service. Scheduled scaling works fine and we get 50 instances of said task.

The problem is that my dynamic scaling sees that these tasks are under-utilized and begins shutting them down, and about half an hour later we get overwhelmed by traffic and the services begin to crash.

I am attaching my module configuration, most values are set outside but it should give you an idea of how it works.

I'm just trying to better understand if I missed something, or if the whole logic I made is not correct.

Thanks ahead to anyone who's able to assist.

resource "aws_appautoscaling_target" "auto_scaling_target" {
  for_each = varpute_services_auto_scaling_configuration
  max_capacity       = each.value.instance_max_capacity
  min_capacity       = each.value.instance_min_capacity
  resource_id        = "service/${var.cluster_name}/${each.key}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "auto_scaling_memory_policy" {
  for_each = varpute_services_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_memory_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    target_value = each.value.memory_scaling_target_value
  }
}

resource "aws_appautoscaling_policy" "auto_scaling_cpu_policy" {
  for_each = varpute_services_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_cpu_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = each.value.cpu_scaling_target_value
  }
}

resource "aws_appautoscaling_policy" "auto_scaling_sqs_policy" {
  for_each = var.sqs_based_auto_scaling_configuration

  name               = "${each.key}_auto_scaling_sqs_policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
  scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
  service_namespace  = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace

  target_tracking_scaling_policy_configuration {
    customized_metric_specification {
      metric_name = "ApproximateNumberOfMessagesVisible"
      namespace   = "AWS/SQS"
      statistic   = "Average"

      dimensions {
        name  = "QueueName"
        value = each.value.queue_name
      }
    }

    target_value        = each.value.sqs_scaling_target_value
    scale_in_cooldown   = 300
    scale_out_cooldown  = 300
  }
}

locals {
  flattened_scheduled_scaling_actions = flatten([
    for service, actions in varpute_services_scheduled_scaling : [
      for idx, action in actions : {
        action_name   = "${service}_scheduled_scaling_${idx + 1}"
        service_name  = service
        schedule      = action.schedule_expression
        desired_count = action.scalable_action.desired_count
      }
    ]
  ])
}

resource "aws_appautoscaling_scheduled_action" "scheduled_scaling" {
  for_each = { for action in local.flattened_scheduled_scaling_actions : action.action_name => action }

  service_namespace  = "ecs"
  resource_id        = "service/${var.cluster_name}/${each.value.service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  name               = each.value.action_name
  schedule           = each.value.schedule

  scalable_target_action {
    min_capacity = each.value.desired_count
    max_capacity = each.value.desired_count
  }
}

Share Improve this question asked Jan 30 at 10:20 AssafAssaf 699 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

One of the options to fix this is use a Cooldown Period for Dynamic Scaling

Scale-out cooldown: 60s (react quickly to traffic)

Scale-in cooldown: 3600s (60 min)

To set this in AWS CLI:

aws application-autoscaling put-scaling-policy \
    --policy-name "ScaleInCooldownPolicy" \
    --service-namespace ecs \
    --resource-id service/YOUR-CLUSTER-NAME/YOUR-SERVICE-NAME \
    --scalable-dimension ecs:service:DesiredCount \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
        "TargetValue": 50.0,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
        },
        "ScaleInCooldown": 3600,
        "ScaleOutCooldown": 60
    }'

本文标签: amazon web servicesAWS Autoscaling Logic IssueStack Overflow