Fix Alibaba Cloud ACK NodePool ErrImagePull After K8s Version Upgrade
Quick Fix Summary
TL;DRCheck and correct the image pull secret for the upgraded node pool.
After a Kubernetes version upgrade, nodes in a new node pool may fail to pull container images due to missing or incorrect authentication credentials for the container registry.
Diagnosis & Causes
Recovery Steps
Step 1: Verify the ErrImagePull Error
Identify the specific pod and node experiencing the image pull failure to confirm the issue is related to authentication.
kubectl get pods -A -o wide | grep -i errimagepull
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 Events Step 2: Check Default Service Account Secrets
Inspect the default service account in the problematic namespace. New node pools often lack the necessary imagePullSecrets that were present in the old cluster.
kubectl describe serviceaccount default -n <namespace>
kubectl get secrets -n <namespace> | grep -i acr Step 3: Patch the Default Service Account
Add the required Alibaba Cloud Container Registry (ACR) image pull secret to the default service account. Replace `<your-acr-secret-name>` with the actual secret name (e.g., `acr-credential`).
kubectl patch serviceaccount default -n <namespace> -p '{"imagePullSecrets": [{"name": "<your-acr-secret-name>"}]}' Step 4: Restart Affected Pods
Delete the pods stuck in ErrImagePull state to force them to re-create with the corrected service account credentials.
kubectl delete pod <pod-name> -n <namespace> Step 5: Verify and Prevent Recurrence
Ensure the image pull secret is correctly configured in the node pool's scaling group template or as a cluster-wide secret to prevent future upgrades from breaking.
# Check if secret exists cluster-wide
kubectl get secret <your-acr-secret-name> --namespace=kube-system
# Review ACK node pool configuration in Alibaba Cloud Console for ImageSecret. Architect's Pro Tip
"This often happens when the node pool upgrade creates new ECS instances. The automated setup may not copy the `imagePullSecret` from the kube-system namespace to the new node's default service account in user namespaces. Always verify the default service account post-upgrade."
Frequently Asked Questions
The secret exists in kube-system, but pods still can't pull images. Why?
Secrets are namespace-scoped. A secret in `kube-system` is not accessible to pods in other namespaces (e.g., `default`). You must either create the secret in each namespace or configure the node pool to inject it automatically.
Can I fix this without restarting all my pods?
For new pods, the fix is automatic after patching the service account. Existing pods must be restarted to pick up the new credentials. Use a rolling update or delete them individually.