Troubleshooting RAM Policy Conflicts Causing Health Check Failures on Alibaba Cloud
Quick Fix Summary
TL;DRTemporarily grant the RAM user/role full permissions to the affected ECS instance or security group.
The RAM:DeniedBySecurityGroup error occurs when a RAM user or role lacks the necessary permissions to perform operations on a security group, which subsequently blocks health check probes from reaching the instance, causing failures.
Diagnosis & Causes
Recovery Steps
Step 1: Verify the Error and Identify the Principal
Confirm the error in Operation Logs and identify the exact RAM user, role, or service account attempting the unauthorized action.
# Check ActionTrail logs for the specific error and principal
aliyun actiontrail LookupEvents --StartTime $(date -d '1 hour ago' +%s) --EndTime $(date +%s) --EventName 'DeniedBySecurityGroup' --output cols=EventName,Username,EventSource,ApiErrorCode,RequestParameters rows Step 2: Analyze Attached RAM Policies
List all policies attached to the identified principal (user/role) to understand the effective permissions.
# For a RAM User:
aliyun ram ListPoliciesForUser --UserName <RAM_USERNAME>
# For a RAM Role:
aliyun ram ListPoliciesForRole --RoleName <ROLENAME> Step 3: Simulate Policy Evaluation
Use the Policy Simulation feature to verify which policy is denying the specific API call (e.g., AuthorizeSecurityGroup) for the health check source IP.
aliyun ram GetPolicyVersion --PolicyName <POLICY_NAME> --VersionId v1 --output file://policy.json
# Manually review the policy.json file for Deny statements related to 'ecs:AuthorizeSecurityGroup', 'ecs:RevokeSecurityGroup', or 'ecs:*'. Step 4: Check for Conflicting 'Deny' Statements
Examine the policy documents. A broad 'Deny' on ecs:* or specific security group actions will override any subsequent 'Allow'.
grep -A5 -B5 '"Effect": "Deny"' policy.json Step 5: Review System and Resource Group Policies
System policies (like AliyunECSFullAccess) or policies attached at the Resource Group level may also be in effect. List them.
aliyun ram ListPolicies --PolicyType System --output table
aliyun resourcemanager ListResourceGroups --output cols=ResourceGroupId,DisplayName rows Step 6: Resolve the Conflict
Modify the offending custom policy. Replace a broad 'Deny' with a more specific 'Allow' list, or attach a policy that explicitly allows the required health check actions.
# Create a new policy version that corrects the Deny statement. Example policy allowing necessary actions:
{
"Statement": [
{
"Effect": "Allow",
"Action": ["ecs:AuthorizeSecurityGroup", "ecs:RevokeSecurityGroup"],
"Resource": "acs:ecs:*:*:security-group/<SECURITY_GROUP_ID>"
}
],
"Version": "1"
} Step 7: Apply the New Policy and Verify
Set the new policy version as default and verify the health checks begin to pass.
aliyun ram SetDefaultPolicyVersion --PolicyName <POLICY_NAME> --VersionId v2
# Monitor ECS instance status and health check logs
aliyun ecs DescribeInstanceHealthStatus --InstanceId <INSTANCE_ID> Architect's Pro Tip
"This often happens after a 'least privilege' policy refinement where a broad 'Deny' on 'ecs:*' was added to a custom policy, inadvertently blocking the Alibaba Cloud internal health check service which requires temporary security group rule modifications."
Frequently Asked Questions
Why does the health check service need to modify my security group?
Alibaba Cloud's health check service probes your instance from specific service IPs. If your security group does not allow these IPs, the service attempts to temporarily add a rule. The RAM:DeniedBySecurityGroup error occurs when the RAM entity lacks permission for this automatic rule management.
Can I avoid granting these permissions to the health check service?
Yes. The preferred long-term fix is to manually add a permanent security group rule allowing the Alibaba Cloud health check source IP ranges (e.g., 100.64.0.0/10 for VPC) for the required probe port (e.g., TCP 80). This eliminates the service's need to modify rules, resolving the RAM permission conflict.