Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(oidc): add considerations for impacted kube-apiserver admission plugins #1726

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

everettraven
Copy link

@everettraven everettraven commented Dec 9, 2024

Updates the original OIDC enhancement proposal to add some considerations for how we resolve an issue with the OpenShift default authorization.openshift.io/RestrictSubjectBindings admission plugin when enabling OIDC.

Copy link
Contributor

openshift-ci bot commented Dec 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2024
Copy link
Contributor

openshift-ci bot commented Dec 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joepvd for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@everettraven everettraven force-pushed the update/external-oidc-apiserver-impact branch from ba8e816 to 8246eb8 Compare December 18, 2024 21:01
@everettraven everettraven changed the title wip: add considerations for kube-apiserver admission plugins when ext… (oidc): add considerations for impacted kube-apiserver admission plugins Dec 18, 2024
@everettraven everettraven marked this pull request as ready for review December 18, 2024 21:02
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2024
This will be done through updates to the appropriate config observers to update the `KubeAPIServerConfig.apiServerArguments` map to:

- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity: AFAIU normally it would be enough to remove the plugins from the --enable-admission-plugins arg, as they are not default plugins that need explicit disabling. However, the config observer doesn't have access to the final config object and therefore the --enable-admission-plugins field, therefore we'll use the --disable-admission-plugins to indicate what needs disabling. We'll also need a special merge so that it gets removed from enabled and added to disabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC from my experimenting, overriding the --enable-admission-plugins in the config to no longer include these admission plugins did not sufficiently disable them and is why I specifically call out adding them to the --disable-admission-plugins flag.

I'm not sure we need to into the exact semantics of how this achieved, but if we do I'm happy to do a bit more digging and figuring out what changes may need to be made to the config logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, no need to go into more detail here; I just added this note as a result of some digging I did, as a note to ourselves.


This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.

Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC.
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been rolled out.

If we remove the CRD the moment the auth type becomes OIDC, we won't give time to the admins to react in case any RBRs exist, as the CRD will be removed immediately (and therefore any existing resources). I believe we'll want this in two steps: CAO complains if RBRs exist, and doesn't proceed with OIDC rollout. Once they are deleted, OIDC rollout proceeds. Once it is completed and OIDC is available (we'll use the new API field for that), OAuth cleanup starts, which includes deleting the CRD.

For the moment, this is the condition used to determine when OIDC has been enabled: https://github.com/openshift/cluster-authentication-operator/pull/740/files#diff-51c6cd196c758006bbe84eed012e6baac4713a856a96b7dfd10adc8ad7986e48R20

When we'll have the new API though, we'll use that to determine that it's available (i.e. Available=True). The KAS-o config observer will make sure to update the status accordingly when it detects that the KAS pods have been rolled out with OIDC.


The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist.

To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss further how we'll communicate this; for example, we can set Available=False/Degraded=True when RBRs exist. We'll need to also take care of some corner cases, e.g. what if someone creates RBRs after the CAO has started the rollout, but before the KAS pods have restarted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to discussing further how we communicate this. I'll go into a bit more detail on this and then we can refine it from there.

For the corner case where a RBR is created after the CAO has already started the rollout process but before the KAS pods have restarted, my expectation is that we remove the CRD, which in turn deletes the CRs (in this case the newly created RBRs). We can discuss this a bit further if we think that this is an unacceptable user experience, but I think this would be OK for now. We could add warnings in the OpenShift documentation for enabling OIDC that any RBRs created during the rollout of the OIDC functionality will be automatically removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my expectation is that we remove the CRD, which in turn deletes the CRs

I also think this sounds good enough for now 👍

@everettraven everettraven requested a review from liouk January 8, 2025 14:21
@liouk
Copy link
Member

liouk commented Jan 10, 2025

/lgtm

Holding until update commits are squashed.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
@everettraven everettraven force-pushed the update/external-oidc-apiserver-impact branch from 6a23a52 to 611371b Compare January 10, 2025 13:33
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
@everettraven everettraven force-pushed the update/external-oidc-apiserver-impact branch from 611371b to 897ae74 Compare January 10, 2025 13:38
@liouk
Copy link
Member

liouk commented Jan 10, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2025
@liouk
Copy link
Member

liouk commented Jan 10, 2025

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 10, 2025
In order to prevent misleading logs about informers that failed to start or failure to connect to the oauth-apiserver, the following changes to this patch are to be made:

- Informers for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled.
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.spec.type` is set to `OIDC`. This will prevent logs in the kube-apiserver associated with not being able to connect to the oauth-apiserver, which we know should not be running when OIDC is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kube-apiserver should avoid using the current state of API resources to control its behavior. Doing so means that manipulation via the API changes runtime behavior in a way that we cannot be confident when a cluster is or is not enforcing behavior. Additionally, it means that code must dynamically react which is generally more complex to build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, the admission chain can be configured by the operator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not controlling the behavior of the kube-apiserver with api resources makes sense to me. I don't think configuration of the admission chain itself will be sufficient to skip this post-start hook.

Would checking for all the criteria (apiserver flags, etc.) to know that the kube-apiserver is not using the openshift oauth-apiserver for auth decisions be possible and sufficient?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with something that I thought would be sufficient based on information available to the kube-apiserver without relying on other api resources in c32abf3

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought - add a new flag for enabling/disabling the post-start hook that is set by the cluster operators


- Disable the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins
- Remove the `rolebindingrestrictions.authorization.openshift.io` CustomResourceDefinition
- Block OIDC enablement while any `RoleBindingRestriction` resources exist; This will be communicated in the `Authentication` resource via the `OIDCConfig` status field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did this, how would an HCP user be aware?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HCP already alert users on cluster operator degraded conditions? This status condition on the Authentication resource is in addition to the existing cluster-authentication-operator cluster operator conditions (don't recall the exact resource type off the top of my head)


##### Changes to the cluster-kube-apiserver-operator

When authentication type is set to OIDC, the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins will be disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific. Is this spec or status? Can you describe what should happen in the transition case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in c32abf3


This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD.

Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been successfully rolled out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking through cases, is it actually

  1. internal oath server is desired, ensure crd/rolebindingrestriction
  2. internal oauth server is configured and not desired and any rolebindingrestrictions currently exist, ensure crd/rolebindingrestriction. This supports a migration case to avoid removing groups and users while the tokens are still honored
  3. internal oauth server is configured and not desired and no rolebindingrestrictions currently exist, remove crd/rolebindingrestriction. This supports a migration case.
  4. internal oauth server is not configuration and not desired, remove crd/rolebindingrestriction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that all sounds right. I'll update this section to be more clear on the different scenarios.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in c32abf3

Additionally, the CAO will be updated to block OIDC configuration on existence of `RoleBindingRestriction` resources. If `RoleBindingRestriction` resources are found,
the Authentication CR's `OIDCConfig` status field will be updated to contain the following conditions:

- Condition: `Progressing`, Status: `False`, Reason: `Blocked`, Message: `OIDC configuration blocked: RoleBindingRestriction resources found`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't force progressing to false. it'll go progressing for other reasons. Instead use Degraded.

Copy link
Author

@everettraven everettraven Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a non-cluster operator related resource (and by this I mean when the cluster operator sync fails this resource shouldn't be updated). This api field doesn't currently exist and therefore nothing is setting it to progressing. I'm fine with changing this to degraded if you think that makes the most sense for this case, but we were trying to be cautious of setting degraded unless the cluster is actually in a broken state.

// +patchStrategy=merge
// +listType=map
// +listMapKey=type
Conditions []metav1.Condition `json:"conditions"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather that conditions, do we really want to know

  1. required: is the internal oauth server active
  2. optional: is the external OIDC configured. (what even knows this. nothing?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to know when the new revisions of the kube-apiserver have successfully rolled out where it is interacting directly with the external OIDC provider for authn decisions.

We thought the conditions pattern would allow:

  • cluster-authentication-operator to communicate to users progressing status of the oidc configuration
  • cluster-kube-apiserver-operator to communicate to the cluster-authentication-operator and users when the apiserver rollout with this configuration was successful and ready to be used
    • In the case of the cluster-authentication-operator noticing that the rollout was successful, it would begin to remove the oauth workloads and resources from the cluster

I don't think we care about the state of the oauth server unless something goes wrong tearing it down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can deduce whether external OIDC has been configured by the following:

  • is there a revisioned OIDC structured auth configmap for each observed current revision of the KAS pods?
  • is the respective structured auth KAS CLI arg enabled for each observed current revision of the KAS pods?
  • are the respective OAuth KAS CLI args disabled for each observed current revision of the KAS pods?

If all the above are true, we can deduce that external OIDC has been configured and rolled out.

If at least one of the KAS pods is on a revision that does not include an OIDC specific config, there is a rollout in progress which is either enabling, or disabling OIDC.

The KAS-o can monitor the rollout status of the KAS pods and update the OIDCConfig.Conditions accordingly.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 13, 2025
Copy link
Contributor

openshift-ci bot commented Jan 13, 2025

New changes are detected. LGTM label has been removed.

Copy link
Contributor

openshift-ci bot commented Jan 16, 2025

@everettraven: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants