Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck on "dial tcp i/o timeout" Error? AWS Load Balancer Controller in Kubernetes(Non-EKS) #4018

Open
ChubbyKay opened this issue Jan 15, 2025 · 2 comments
Labels
kind/documentation Categorizes issue or PR as related to documentation. triage/needs-investigation

Comments

@ChubbyKay
Copy link

ChubbyKay commented Jan 15, 2025

Hello, I encountered issues when setting up AWS Load Balancer Controller in Kubernetes. Despite multiple attempts, the controller fails to function properly. I would appreciate your assistance in diagnosing and resolving the issue.

Background Information

  • Kubernetes Version: 1.31.1 (built using Kubespray)
  • CNI Plugins: Initially Calico, later switched to amazon-vpc-cni-k8s (ECR region changed to ap-northeast-1, all aws-node Pods are in Running state).
  • Installation Method: Using Helm (version v3.16.3)
    • Command used:
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
--namespace kube-system \
--set clusterName=my-cluster \
--set region=ap-northeast-1 \
--set vpcId=vpc-xxxxxxxxxxxxxxxxx \
--set serviceAccount.create=true
  • IAM Role Configuration:
    • Policies attached:
      • AmazonEC2ContainerRegistryReadOnly
      • AmazonEKS_CNI_Policy
      • AmazonEKSClusterPolicy
      • AWSLoadBalancerControllerIAMPolicy
    • The IAM Role is bound to the nodes.
  • Subnet Tags: kubernetes.io/cluster/ set to owned.
  • Pod CIDR: 10.233.64.0/18
  • Security Group Rules: All opened to 0.0.0.0/0 for both ingress and egress.
  • IMDSv2 Configuration:
    • HttpTokens: required
    • HttpPutResponseHopLimit: 2

Issue Description
Problem 1
After installation, the aws-load-balancer-controller Pod fails to run properly. Logs show the following error:

{"level":"error","ts":"2025-01-15T03:38:31Z","logger":"setup","msg":"unable to create controller","controller":"Ingress","error":"Get \"https://xx.xxx.x.x:443/apis/networking.k8s.io/v1\": dial tcp xx.xxx.x.x:443: i/o timeout"}

Problem 2
In a previous attempt, I noticed that the ServiceAccount associated with the controller cannot mount any tokens or secrets:

kubectl describe serviceaccount aws-load-balancer-controller -n kube-system
Name:                aws-load-balancer-controller
Namespace:           kube-system
Labels:              app.kubernetes.io/instance=aws-load-balancer-controller
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=aws-load-balancer-controller
                      app.kubernetes.io/version=v2.11.0
                      helm.sh/chart=aws-load-balancer-controller-1.11.0
Annotations:     meta.helm.sh/release-name: aws-load-balancer-controller
                      meta.helm.sh/release-namespace: kube-system
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

Help Needed

  1. Error Analysis: What could be causing the dial tcp xx.xxx.x.x:443: i/o timeout error? Is it related to networking, CNI, or other configurations?
  2. Installation Guidance: If there are misconfigurations, how can I fix them to make the controller work properly?
  3. Alternative Methods: Are there other ways to implement a high-availability load balancer compatible with the AWS environment?
  4. Best Practices: Any recommendations for optimal configurations or installation parameters would be greatly appreciated!
@shraddhabang
Copy link
Collaborator

@ChubbyKay Can you try setting the HttpPutResponseHopLimit to 3 and see if it works for you.
We have seen similar issue previously with other customer and they fixed it by setting the HttpPutResponseHopLimit to 3.
We will investigate more meanwhile.

@shraddhabang shraddhabang added triage/needs-investigation kind/documentation Categorizes issue or PR as related to documentation. labels Jan 15, 2025
@ChubbyKay
Copy link
Author

ChubbyKay commented Jan 16, 2025

@shraddhabang
Hi, thank you for your suggestion!

I updated the HttpPutResponseHopLimit to 3 as recommended:

{
    "InstanceId": "i-xxxxxxxxxxxxxxxxxxxxx",
    "InstanceMetadataOptions": {
        "State": "pending",
        "HttpTokens": "required",
        "HttpPutResponseHopLimit": 3,
        "HttpEndpoint": "enabled",
        "HttpProtocolIpv6": "disabled",
        "InstanceMetadataTags": "disabled"
    }
}

Then uninstalled and reinstalled the AWS Load Balancer Controller using Helm. Unfortunately, the issue persists. Here's the error log:

{"level":"info","ts":"2025-01-16T01:38:26Z","msg":"version","GitVersion":"v2.11.0","GitCommit":"ba4152c1ba7c75be194d75cf343219d4aeaeb116","BuildDate":"2024-12-12T21:01:50+0000"}  
{"level":"error","ts":"2025-01-16T01:38:56Z","logger":"setup","msg":"unable to create controller","controller":"Ingress","error":"Get \"https://10.233.0.1:443/apis/networking.k8s.io/v1\": dial tcp 10.233.0.1:443: i/o timeout"}  

I’ve double-checked the other configurations:

  • VPC and Subnets: All tagged correctly for the Load Balancer Controller to discover.
  • CNI Plugin: I’m using amazon-vpc-cni-k8s, and the aws-node Pods are running without issues.

At this point, I’m unsure if the issue is related to networking (e.g., the way my Kubernetes cluster handles internal DNS/API communication) or a configuration mismatch.

Do you have any other suggestions I could try? Or is there a way to further debug the Controller's inability to connect to the Kubernetes API server?

Thank you again for your help!

@ChubbyKay Can you try setting the HttpPutResponseHopLimit to 3 and see if it works for you. We have seen similar issue previously with other customer and they fixed it by setting the HttpPutResponseHopLimit to 3. We will investigate more meanwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. triage/needs-investigation
Projects
None yet
Development

No branches or pull requests

2 participants