CRITICAL

Troubleshooting RAM Policy Conflicts Causing Health Check Failures on Alibaba Cloud

Quick Fix Summary

TL;DR

Temporarily grant the RAM user/role full permissions to the affected ECS instance or security group.

The RAM:DeniedBySecurityGroup error occurs when a RAM user or role lacks the necessary permissions to perform operations on a security group, which subsequently blocks health check probes from reaching the instance, causing failures.

Diagnosis & Causes

  • RAM policy explicitly denies actions on security groups (e.g., ecs:AuthorizeSecurityGroup).
  • RAM policy is missing required permissions for security group management, conflicting with health check service requirements.
  • Recovery Steps

    1

    Step 1: Verify the Error and Identify the Principal

    Confirm the error in Operation Logs and identify the exact RAM user, role, or service account attempting the unauthorized action.

    bash
    # Check ActionTrail logs for the specific error and principal
    aliyun actiontrail LookupEvents --StartTime $(date -d '1 hour ago' +%s) --EndTime $(date +%s) --EventName 'DeniedBySecurityGroup' --output cols=EventName,Username,EventSource,ApiErrorCode,RequestParameters rows
    2

    Step 2: Analyze Attached RAM Policies

    List all policies attached to the identified principal (user/role) to understand the effective permissions.

    bash
    # For a RAM User:
    aliyun ram ListPoliciesForUser --UserName <RAM_USERNAME>
    # For a RAM Role:
    aliyun ram ListPoliciesForRole --RoleName <ROLENAME>
    3

    Step 3: Simulate Policy Evaluation

    Use the Policy Simulation feature to verify which policy is denying the specific API call (e.g., AuthorizeSecurityGroup) for the health check source IP.

    bash
    aliyun ram GetPolicyVersion --PolicyName <POLICY_NAME> --VersionId v1 --output file://policy.json
    # Manually review the policy.json file for Deny statements related to 'ecs:AuthorizeSecurityGroup', 'ecs:RevokeSecurityGroup', or 'ecs:*'.
    4

    Step 4: Check for Conflicting 'Deny' Statements

    Examine the policy documents. A broad 'Deny' on ecs:* or specific security group actions will override any subsequent 'Allow'.

    bash
    grep -A5 -B5 '"Effect": "Deny"' policy.json
    5

    Step 5: Review System and Resource Group Policies

    System policies (like AliyunECSFullAccess) or policies attached at the Resource Group level may also be in effect. List them.

    bash
    aliyun ram ListPolicies --PolicyType System --output table
    aliyun resourcemanager ListResourceGroups --output cols=ResourceGroupId,DisplayName rows
    6

    Step 6: Resolve the Conflict

    Modify the offending custom policy. Replace a broad 'Deny' with a more specific 'Allow' list, or attach a policy that explicitly allows the required health check actions.

    json
    # Create a new policy version that corrects the Deny statement. Example policy allowing necessary actions:
    {
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["ecs:AuthorizeSecurityGroup", "ecs:RevokeSecurityGroup"],
          "Resource": "acs:ecs:*:*:security-group/<SECURITY_GROUP_ID>"
        }
      ],
      "Version": "1"
    }
    7

    Step 7: Apply the New Policy and Verify

    Set the new policy version as default and verify the health checks begin to pass.

    bash
    aliyun ram SetDefaultPolicyVersion --PolicyName <POLICY_NAME> --VersionId v2
    # Monitor ECS instance status and health check logs
    aliyun ecs DescribeInstanceHealthStatus --InstanceId <INSTANCE_ID>

    Architect's Pro Tip

    "This often happens after a 'least privilege' policy refinement where a broad 'Deny' on 'ecs:*' was added to a custom policy, inadvertently blocking the Alibaba Cloud internal health check service which requires temporary security group rule modifications."

    Frequently Asked Questions

    Why does the health check service need to modify my security group?

    Alibaba Cloud's health check service probes your instance from specific service IPs. If your security group does not allow these IPs, the service attempts to temporarily add a rule. The RAM:DeniedBySecurityGroup error occurs when the RAM entity lacks permission for this automatic rule management.

    Can I avoid granting these permissions to the health check service?

    Yes. The preferred long-term fix is to manually add a permanent security group rule allowing the Alibaba Cloud health check source IP ranges (e.g., 100.64.0.0/10 for VPC) for the required probe port (e.g., TCP 80). This eliminates the service's need to modify rules, resolving the RAM permission conflict.

    Related Alibaba Cloud Guides