-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update to latest base image with containerd 2.0.2 #3848
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/retest |
All of CI passed the first try (ignoring failure to schedule the CI workload itself, unrelated, timing mismatch with autoscaling vs job scheduler), however when building a node image locally:
Debugging, my current suspicion is that we need to wait for containerd to be ready, it takes longer to start? Around 1s on a fairly large cloud VM:
If i start containerd (v1.7.24) in the previous base image like this to simulate the build process:
These times are pretty representative of repeated attempts, containerd 2.0.2 takes about 3x to start versus v1.7.24 (NOTE: these are the arm64 cross-build on an amd64 host) circled back in #3828 (comment) |
To replicate:
docker run -d --entrypoint=sleep --name="test-old-base" --platform=linux/arm64 --security-opt=seccomp=unconfined docker.io/kindest/base:v20241212-9f82dd49 infinity
docker run -d --entrypoint=sleep --name="test-new-base" --platform=linux/arm64 --security-opt=seccomp=unconfined docker.io/kindest/base:v20250117-f528b021 infinity
docker exec -it test-old-base containerd docker exec -it test-new-base containerd (then watch for the log line like "containerd successfully booted in 0.325773s" and "containerd successfully booted in 0.972910s") |
Update: this is probably not worth discussing upstream, because 2.0.2 is still < 0.07s with amd64 + amd64 host. It is however consistently longer than 1.7.4. Something with arm64 qemu must be even more pathological. I think let's add image pull retries + waiting for it to start. We only need to do this for pulling, not imports (we do all pulling first), which is a nice idea anyhow to handle transient network issues. |
contains the base image built from #3828 in https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/post-kind-push-base-image/1880330906284068864
TODO: node image (prow e2es will use this, github actions will use the default node image only, though I expect we are more likely to catch issues in the full kubernetes e2e tests anyhow for this particular change)