-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(oidc): add considerations for impacted kube-apiserver admission plugins #1726
base: master
Are you sure you want to change the base?
(oidc): add considerations for impacted kube-apiserver admission plugins #1726
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ba8e816
to
8246eb8
Compare
This will be done through updates to the appropriate config observers to update the `KubeAPIServerConfig.apiServerArguments` map to: | ||
|
||
- Remove the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins from the `--enable-admission-plugins` argument | ||
- Add the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins to the `--disable-admission-plugins` argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity: AFAIU normally it would be enough to remove the plugins from the --enable-admission-plugins
arg, as they are not default plugins that need explicit disabling. However, the config observer doesn't have access to the final config object and therefore the --enable-admission-plugins
field, therefore we'll use the --disable-admission-plugins
to indicate what needs disabling. We'll also need a special merge so that it gets removed from enabled and added to disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC from my experimenting, overriding the --enable-admission-plugins
in the config to no longer include these admission plugins did not sufficiently disable them and is why I specifically call out adding them to the --disable-admission-plugins
flag.
I'm not sure we need to into the exact semantics of how this achieved, but if we do I'm happy to do a bit more digging and figuring out what changes may need to be made to the config logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, no need to go into more detail here; I just added this note as a result of some digging I did, as a note to ourselves.
|
||
This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD. | ||
|
||
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC. | |
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been rolled out. |
If we remove the CRD the moment the auth type becomes OIDC
, we won't give time to the admins to react in case any RBRs exist, as the CRD will be removed immediately (and therefore any existing resources). I believe we'll want this in two steps: CAO complains if RBRs exist, and doesn't proceed with OIDC rollout. Once they are deleted, OIDC rollout proceeds. Once it is completed and OIDC is available (we'll use the new API field for that), OAuth cleanup starts, which includes deleting the CRD.
For the moment, this is the condition used to determine when OIDC has been enabled: https://github.com/openshift/cluster-authentication-operator/pull/740/files#diff-51c6cd196c758006bbe84eed012e6baac4713a856a96b7dfd10adc8ad7986e48R20
When we'll have the new API though, we'll use that to determine that it's available (i.e. Available=True
). The KAS-o config observer will make sure to update the status accordingly when it detects that the KAS pods have been rolled out with OIDC.
|
||
The OIDC authentication mode on the cluster will not be allowed to be enabled if any `RoleBindingRestriction` resources exist. | ||
|
||
To communicate the reason for the enablement of the OIDC functionality being blocked, the `Authentication` API will be extended with a new status field to communicate the condition of the OIDC feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss further how we'll communicate this; for example, we can set Available=False/Degraded=True
when RBRs exist. We'll need to also take care of some corner cases, e.g. what if someone creates RBRs after the CAO has started the rollout, but before the KAS pods have restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to discussing further how we communicate this. I'll go into a bit more detail on this and then we can refine it from there.
For the corner case where a RBR is created after the CAO has already started the rollout process but before the KAS pods have restarted, my expectation is that we remove the CRD, which in turn deletes the CRs (in this case the newly created RBRs). We can discuss this a bit further if we think that this is an unacceptable user experience, but I think this would be OK for now. We could add warnings in the OpenShift documentation for enabling OIDC that any RBRs created during the rollout of the OIDC functionality will be automatically removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my expectation is that we remove the CRD, which in turn deletes the CRs
I also think this sounds good enough for now 👍
/lgtm Holding until update commits are squashed. |
6a23a52
to
611371b
Compare
Signed-off-by: Bryce Palmer <[email protected]>
611371b
to
897ae74
Compare
/lgtm |
/hold cancel |
In order to prevent misleading logs about informers that failed to start or failure to connect to the oauth-apiserver, the following changes to this patch are to be made: | ||
|
||
- Informers for the `Group` API are only configured and started as part of the first run of the `authorization.openshift.io/RestrictSubjectBindings` admission plugin validation loop. This makes it such that the informer will not be configured or attempt to start when the admission plugin is disabled. | ||
- The post-start hook that checks for oauth-apiserver connectivity will be skipped if the `Authentication` resource `.spec.type` is set to `OIDC`. This will prevent logs in the kube-apiserver associated with not being able to connect to the oauth-apiserver, which we know should not be running when OIDC is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the kube-apiserver should avoid using the current state of API resources to control its behavior. Doing so means that manipulation via the API changes runtime behavior in a way that we cannot be confident when a cluster is or is not enforcing behavior. Additionally, it means that code must dynamically react which is generally more complex to build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, the admission chain can be configured by the operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not controlling the behavior of the kube-apiserver with api resources makes sense to me. I don't think configuration of the admission chain itself will be sufficient to skip this post-start hook.
Would checking for all the criteria (apiserver flags, etc.) to know that the kube-apiserver is not using the openshift oauth-apiserver for auth decisions be possible and sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with something that I thought would be sufficient based on information available to the kube-apiserver without relying on other api resources in c32abf3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought - add a new flag for enabling/disabling the post-start hook that is set by the cluster operators
|
||
- Disable the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins | ||
- Remove the `rolebindingrestrictions.authorization.openshift.io` CustomResourceDefinition | ||
- Block OIDC enablement while any `RoleBindingRestriction` resources exist; This will be communicated in the `Authentication` resource via the `OIDCConfig` status field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we did this, how would an HCP user be aware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HCP already alert users on cluster operator degraded conditions? This status condition on the Authentication
resource is in addition to the existing cluster-authentication-operator cluster operator conditions (don't recall the exact resource type off the top of my head)
|
||
##### Changes to the cluster-kube-apiserver-operator | ||
|
||
When authentication type is set to OIDC, the `authorization.openshift.io/RestrictSubjectBindings` and `authorization.openshift.io/ValidateRoleBindingRestriction` admission plugins will be disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be specific. Is this spec or status? Can you describe what should happen in the transition case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in c32abf3
|
||
This will mean vendoring the generated CRD manifests as outlined in https://github.com/openshift/api/tree/master?tab=readme-ov-file#vendoring-generated-manifests-into-other-repositories and adding a new controller to manage the CRD. | ||
|
||
Managing the CRD will consist of ensuring that the CRD is present on the cluster, and matches the desired manifest, when the authentication type is _not_ OIDC, and ensuring the CRD is not present present on the cluster when the authentication type _is_ OIDC and OIDC configuration has been successfully rolled out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking through cases, is it actually
- internal oath server is desired, ensure crd/rolebindingrestriction
- internal oauth server is configured and not desired and any rolebindingrestrictions currently exist, ensure crd/rolebindingrestriction. This supports a migration case to avoid removing groups and users while the tokens are still honored
- internal oauth server is configured and not desired and no rolebindingrestrictions currently exist, remove crd/rolebindingrestriction. This supports a migration case.
- internal oauth server is not configuration and not desired, remove crd/rolebindingrestriction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that all sounds right. I'll update this section to be more clear on the different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in c32abf3
Additionally, the CAO will be updated to block OIDC configuration on existence of `RoleBindingRestriction` resources. If `RoleBindingRestriction` resources are found, | ||
the Authentication CR's `OIDCConfig` status field will be updated to contain the following conditions: | ||
|
||
- Condition: `Progressing`, Status: `False`, Reason: `Blocked`, Message: `OIDC configuration blocked: RoleBindingRestriction resources found` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't force progressing to false. it'll go progressing for other reasons. Instead use Degraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for a non-cluster operator related resource (and by this I mean when the cluster operator sync fails this resource shouldn't be updated). This api field doesn't currently exist and therefore nothing is setting it to progressing. I'm fine with changing this to degraded if you think that makes the most sense for this case, but we were trying to be cautious of setting degraded unless the cluster is actually in a broken state.
// +patchStrategy=merge | ||
// +listType=map | ||
// +listMapKey=type | ||
Conditions []metav1.Condition `json:"conditions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather that conditions, do we really want to know
- required: is the internal oauth server active
- optional: is the external OIDC configured. (what even knows this. nothing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to know when the new revisions of the kube-apiserver have successfully rolled out where it is interacting directly with the external OIDC provider for authn decisions.
We thought the conditions pattern would allow:
- cluster-authentication-operator to communicate to users progressing status of the oidc configuration
- cluster-kube-apiserver-operator to communicate to the cluster-authentication-operator and users when the apiserver rollout with this configuration was successful and ready to be used
- In the case of the cluster-authentication-operator noticing that the rollout was successful, it would begin to remove the oauth workloads and resources from the cluster
I don't think we care about the state of the oauth server unless something goes wrong tearing it down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can deduce whether external OIDC has been configured by the following:
- is there a revisioned OIDC structured auth configmap for each observed current revision of the KAS pods?
- is the respective structured auth KAS CLI arg enabled for each observed current revision of the KAS pods?
- are the respective OAuth KAS CLI args disabled for each observed current revision of the KAS pods?
If all the above are true, we can deduce that external OIDC has been configured and rolled out.
If at least one of the KAS pods is on a revision that does not include an OIDC specific config, there is a rollout in progress which is either enabling, or disabling OIDC.
The KAS-o can monitor the rollout status of the KAS pods and update the OIDCConfig.Conditions
accordingly.
Signed-off-by: Bryce Palmer <[email protected]>
New changes are detected. LGTM label has been removed. |
Signed-off-by: Bryce Palmer <[email protected]>
Signed-off-by: Bryce Palmer <[email protected]>
@everettraven: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Updates the original OIDC enhancement proposal to add some considerations for how we resolve an issue with the OpenShift default authorization.openshift.io/RestrictSubjectBindings admission plugin when enabling OIDC.