(oidc): add considerations for impacted kube-apiserver admission plugins #1726

everettraven · 2024-12-09T21:15:31Z

Updates the original OIDC enhancement proposal to add some considerations for how we resolve an issue with the OpenShift default authorization.openshift.io/RestrictSubjectBindings admission plugin when enabling OIDC.

openshift-ci · 2024-12-09T21:15:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

enhancements/authentication/direct-external-oidc-provider.md

openshift-ci · 2024-12-17T21:54:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joepvd for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/authentication/direct-external-oidc-provider.md

liouk · 2024-12-24T09:17:59Z

enhancements/authentication/direct-external-oidc-provider.md

+This will be done through updates to the appropriate config observers to update the `KubeAPIServerConfig.apiServerArguments` map to:
+
+- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument
+- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument


For clarity: AFAIU normally it would be enough to remove the plugins from the --enable-admission-plugins arg, as they are not default plugins that need explicit disabling. However, the config observer doesn't have access to the final config object and therefore the --enable-admission-plugins field, therefore we'll use the --disable-admission-plugins to indicate what needs disabling. We'll also need a special merge so that it gets removed from enabled and added to disabled.

IIRC from my experimenting, overriding the --enable-admission-plugins in the config to no longer include these admission plugins did not sufficiently disable them and is why I specifically call out adding them to the --disable-admission-plugins flag.

I'm not sure we need to into the exact semantics of how this achieved, but if we do I'm happy to do a bit more digging and figuring out what changes may need to be made to the config logic.

Agreed, no need to go into more detail here; I just added this note as a result of some digging I did, as a note to ourselves.

enhancements/authentication/direct-external-oidc-provider.md

liouk · 2024-12-24T09:26:39Z

enhancements/authentication/direct-external-oidc-provider.md

+
+This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.
+
+Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.


Suggested change

Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.

Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been rolled out.

If we remove the CRD the moment the auth type becomes OIDC, we won't give time to the admins to react in case any RBRs exist, as the CRD will be removed immediately (and therefore any existing resources). I believe we'll want this in two steps: CAO complains if RBRs exist, and doesn't proceed with OIDC rollout. Once they are deleted, OIDC rollout proceeds. Once it is completed and OIDC is available (we'll use the new API field for that), OAuth cleanup starts, which includes deleting the CRD.

For the moment, this is the condition used to determine when OIDC has been enabled: https://github.com/openshift/cluster-authentication-operator/pull/740/files#diff-51c6cd196c758006bbe84eed012e6baac4713a856a96b7dfd10adc8ad7986e48R20

When we'll have the new API though, we'll use that to determine that it's available (i.e. Available=True). The KAS-o config observer will make sure to update the status accordingly when it detects that the KAS pods have been rolled out with OIDC.

enhancements/authentication/direct-external-oidc-provider.md

liouk · 2024-12-24T09:35:43Z

enhancements/authentication/direct-external-oidc-provider.md

+
+The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist.
+
+To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature.


Let's discuss further how we'll communicate this; for example, we can set Available=False/Degraded=True when RBRs exist. We'll need to also take care of some corner cases, e.g. what if someone creates RBRs after the CAO has started the rollout, but before the KAS pods have restarted?

+1 to discussing further how we communicate this. I'll go into a bit more detail on this and then we can refine it from there.

For the corner case where a RBR is created after the CAO has already started the rollout process but before the KAS pods have restarted, my expectation is that we remove the CRD, which in turn deletes the CRs (in this case the newly created RBRs). We can discuss this a bit further if we think that this is an unacceptable user experience, but I think this would be OK for now. We could add warnings in the OpenShift documentation for enabling OIDC that any RBRs created during the rollout of the OIDC functionality will be automatically removed.

my expectation is that we remove the CRD, which in turn deletes the CRs

I also think this sounds good enough for now 👍

enhancements/authentication/direct-external-oidc-provider.md

liouk · 2025-01-10T10:50:08Z

/lgtm

Holding until update commits are squashed.
/hold

Signed-off-by: Bryce Palmer <[email protected]>

liouk · 2025-01-10T13:40:16Z

/lgtm

liouk · 2025-01-10T13:40:31Z

/hold cancel

deads2k · 2025-01-13T15:51:15Z

enhancements/authentication/direct-external-oidc-provider.md

+In order to prevent misleading logs about informers that failed to start or failure to connect to the oauth-apiserver, the following changes to this patch are to be made:
+
+- Informers for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.
+- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.spec.type` is set to `OIDC`. This will prevent logs in the kube-apiserver associated with not being able to connect to the oauth-apiserver, which we know should not be running when OIDC is enabled.


the kube-apiserver should avoid using the current state of API resources to control its behavior. Doing so means that manipulation via the API changes runtime behavior in a way that we cannot be confident when a cluster is or is not enforcing behavior. Additionally, it means that code must dynamically react which is generally more complex to build.

Instead, the admission chain can be configured by the operator

Not controlling the behavior of the kube-apiserver with api resources makes sense to me. I don't think configuration of the admission chain itself will be sufficient to skip this post-start hook.

Would checking for all the criteria (apiserver flags, etc.) to know that the kube-apiserver is not using the openshift oauth-apiserver for auth decisions be possible and sufficient?

Updated with something that I thought would be sufficient based on information available to the kube-apiserver without relying on other api resources in c32abf3

Another thought - add a new flag for enabling/disabling the post-start hook that is set by the cluster operators

deads2k · 2025-01-13T15:51:52Z

enhancements/authentication/direct-external-oidc-provider.md

+
+- Disable the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins
+- Remove the `rolebindingrestrictions.authorization.openshift.io` CustomResourceDefinition
+- Block OIDC enablement while any `RoleBindingRestriction` resources exist; This will be communicated in the `Authentication` resource via the `OIDCConfig` status field


if we did this, how would an HCP user be aware?

Does HCP already alert users on cluster operator degraded conditions? This status condition on the Authentication resource is in addition to the existing cluster-authentication-operator cluster operator conditions (don't recall the exact resource type off the top of my head)

deads2k · 2025-01-13T15:53:17Z

enhancements/authentication/direct-external-oidc-provider.md

+
+##### Changes to the cluster-kube-apiserver-operator
+
+When authentication type is set to OIDC, the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins will be disabled.


Be specific. Is this spec or status? Can you describe what should happen in the transition case?

updated in c32abf3

deads2k · 2025-01-13T15:57:57Z

enhancements/authentication/direct-external-oidc-provider.md

+
+This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.
+
+Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been successfully rolled out. 


thinking through cases, is it actually

internal oath server is desired, ensure crd/rolebindingrestriction

internal oauth server is configured and not desired and any rolebindingrestrictions currently exist, ensure crd/rolebindingrestriction. This supports a migration case to avoid removing groups and users while the tokens are still honored

internal oauth server is configured and not desired and no rolebindingrestrictions currently exist, remove crd/rolebindingrestriction. This supports a migration case.

internal oauth server is not configuration and not desired, remove crd/rolebindingrestriction

Yeah, that all sounds right. I'll update this section to be more clear on the different scenarios.

Updated in c32abf3

deads2k · 2025-01-13T16:01:45Z

enhancements/authentication/direct-external-oidc-provider.md

+Additionally, the CAO will be updated to block OIDC configuration on existence of `RoleBindingRestriction` resources. If `RoleBindingRestriction` resources are found,
+the Authentication CR's `OIDCConfig` status field will be updated to contain the following conditions:
+
+- Condition: `Progressing`, Status: `False`, Reason: `Blocked`, Message: `OIDC configuration blocked: RoleBindingRestriction resources found`


Can't force progressing to false. it'll go progressing for other reasons. Instead use Degraded.

This is for a non-cluster operator related resource (and by this I mean when the cluster operator sync fails this resource shouldn't be updated). This api field doesn't currently exist and therefore nothing is setting it to progressing. I'm fine with changing this to degraded if you think that makes the most sense for this case, but we were trying to be cautious of setting degraded unless the cluster is actually in a broken state.

deads2k · 2025-01-13T16:02:51Z

enhancements/authentication/direct-external-oidc-provider.md

+    // +patchStrategy=merge
+    // +listType=map
+    // +listMapKey=type
+    Conditions []metav1.Condition `json:"conditions"`


Rather that conditions, do we really want to know

required: is the internal oauth server active

optional: is the external OIDC configured. (what even knows this. nothing?)

We are trying to know when the new revisions of the kube-apiserver have successfully rolled out where it is interacting directly with the external OIDC provider for authn decisions.

We thought the conditions pattern would allow:

cluster-authentication-operator to communicate to users progressing status of the oidc configuration

cluster-kube-apiserver-operator to communicate to the cluster-authentication-operator and users when the apiserver rollout with this configuration was successful and ready to be used

In the case of the cluster-authentication-operator noticing that the rollout was successful, it would begin to remove the oauth workloads and resources from the cluster

I don't think we care about the state of the oauth server unless something goes wrong tearing it down.

We can deduce whether external OIDC has been configured by the following:

is there a revisioned OIDC structured auth configmap for each observed current revision of the KAS pods?

is the respective structured auth KAS CLI arg enabled for each observed current revision of the KAS pods?

are the respective OAuth KAS CLI args disabled for each observed current revision of the KAS pods?

If all the above are true, we can deduce that external OIDC has been configured and rolled out.

If at least one of the KAS pods is on a revision that does not include an OIDC specific config, there is a rollout in progress which is either enabling, or disabling OIDC.

The KAS-o can monitor the rollout status of the KAS pods and update the OIDCConfig.Conditions accordingly.

Signed-off-by: Bryce Palmer <[email protected]>

openshift-ci · 2025-01-13T19:49:53Z

New changes are detected. LGTM label has been removed.

Signed-off-by: Bryce Palmer <[email protected]>

openshift-ci · 2025-01-16T20:03:51Z

@everettraven: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2024

liouk reviewed Dec 12, 2024

View reviewed changes

liouk reviewed Dec 13, 2024

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

liouk reviewed Dec 13, 2024

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven commented Dec 18, 2024

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven force-pushed the update/external-oidc-apiserver-impact branch from ba8e816 to 8246eb8 Compare December 18, 2024 21:01

everettraven changed the title ~~wip: add considerations for kube-apiserver admission plugins when ext…~~ (oidc): add considerations for impacted kube-apiserver admission plugins Dec 18, 2024

everettraven marked this pull request as ready for review December 18, 2024 21:02

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2024

openshift-ci bot requested review from bradmwilliams and sosiouxme December 18, 2024 21:02

liouk suggested changes Dec 24, 2024

View reviewed changes

everettraven commented Jan 7, 2025

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven mentioned this pull request Jan 7, 2025

WIP: add oidc configuration status field openshift/api#2137

Draft

everettraven commented Jan 8, 2025

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven commented Jan 8, 2025

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven commented Jan 8, 2025

View reviewed changes

enhancements/authentication/direct-external-oidc-provider.md Outdated Show resolved Hide resolved

everettraven requested a review from liouk January 8, 2025 14:21

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2025

openshift-ci bot assigned liouk Jan 10, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025

everettraven force-pushed the update/external-oidc-apiserver-impact branch from 6a23a52 to 611371b Compare January 10, 2025 13:33

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025

authentication: (oidc): Add considerations for impacted admission plugin

897ae74

Signed-off-by: Bryce Palmer <[email protected]>

everettraven force-pushed the update/external-oidc-apiserver-impact branch from 611371b to 897ae74 Compare January 10, 2025 13:38

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2025

deads2k reviewed Jan 13, 2025

View reviewed changes

initial updates based on david's review

c32abf3

Signed-off-by: Bryce Palmer <[email protected]>

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2025

everettraven added 2 commits January 16, 2025 13:50

updates after discussion with david eads

e725631

Signed-off-by: Bryce Palmer <[email protected]>

update testing requirements

d8d5b66

Signed-off-by: Bryce Palmer <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

everettraven commented Dec 9, 2024 •

edited

Loading

openshift-ci bot commented Dec 9, 2024

openshift-ci bot commented Dec 17, 2024

liouk Dec 24, 2024

everettraven Jan 2, 2025

liouk Jan 6, 2025

liouk Dec 24, 2024

liouk Dec 24, 2024

everettraven Jan 2, 2025

liouk Jan 6, 2025

liouk commented Jan 10, 2025

liouk commented Jan 10, 2025

liouk commented Jan 10, 2025

deads2k Jan 13, 2025

deads2k Jan 13, 2025

everettraven Jan 13, 2025

everettraven Jan 13, 2025

everettraven Jan 14, 2025

deads2k Jan 13, 2025

everettraven Jan 13, 2025

deads2k Jan 13, 2025

everettraven Jan 13, 2025

deads2k Jan 13, 2025

everettraven Jan 13, 2025

everettraven Jan 13, 2025

deads2k Jan 13, 2025

everettraven Jan 13, 2025 •

edited

Loading

deads2k Jan 13, 2025

everettraven Jan 13, 2025

liouk Jan 14, 2025

openshift-ci bot commented Jan 13, 2025

openshift-ci bot commented Jan 16, 2025


		This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.

		Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.


		The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist.

		To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature.


		##### Changes to the cluster-kube-apiserver-operator

		When authentication type is set to OIDC, the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins will be disabled.

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

Are you sure you want to change the base?

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

Conversation

everettraven commented Dec 9, 2024 • edited Loading

openshift-ci bot commented Dec 9, 2024

openshift-ci bot commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liouk commented Jan 10, 2025

liouk commented Jan 10, 2025

liouk commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

everettraven Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 13, 2025

openshift-ci bot commented Jan 16, 2025

everettraven commented Dec 9, 2024 •

edited

Loading

everettraven Jan 13, 2025 •

edited

Loading