-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Enhancement to outline path for network policies for all core components #1720
base: master
Are you sure you want to change the base?
Conversation
- As an administrator, I need to be able to override specific | ||
OpenShift policies to be more restrictive so that I can satisfy my | ||
security department |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the most part, this should not be necessary. If a given restriction is safe to apply, then it should be part of the default policy. And in that case, anything "more restrictive" would be something that would cause cluster functionality to break in some way. IMO we shouldn't allow admins to break functionality they aren't using, we should allow them to disable that functionality instead (and then they don't need to add NetworkPolicies restricting its use).
One specific case I can think of where this might apply is admission control webhooks / validation webhooks. The apiserver needs to reach out to any namespace that contains such a webhook, so by default, we'd have to allow the apiserver egress access to all namespaces (or at least, all "system" namespaces?). Administrators might want to disable that. But should that be expressed as "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that", or should we instead have kube-apiserver-operator just automatically add NPs for all and only the namespaces that contain webhooks? Or should we have a rule like "webhooks can only exist in namespaces with the label foo
"?
(Also, FTR, there is no way to implement "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that" given the semantics of NetworkPolicy; we would have to say that administrators are allowed to delete the "kube-apiserver can reach all namespaces" NetworkPolicy.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want administrators to create AdminNetworkPolicy to express their exceptions (where they can deny).
Ultimately, I would love for operators to have a way to have config that allows them to create policy to restrict things like webhooks. They may need an allow-list of valid domains, or something.
But, for the shorter term, we already have people deploying their own Network Policy (not admin, it wasn't ready then) to restrict our stuff since we have no policy today.
1. Change the namespace admission controller so all namespaces with an | ||
openshift- prefix (since those are special anyway) are labeled with | ||
the “openshift-namespace’ label so **that network policy can address | ||
them** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always forget what the rule about OLM operators and openshift-
is, so maybe you should explain that somewhere. (In particular, it seems from some of the other comments that OLM operators can make openshift-
namespaces, but then how does that work with "(Not) Adding policies for operators not created by Red Hat"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule is horrible.
They get installed into a default openshift-operators namespace, but the OLM will need to own the policy for that namespace. I suspect it will need to be wildcard allow egress and ingress.
But, worse, the admin can chose any namespace to add it to... there are ongoing conversations about how to deal with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Former OLM team member here (recently moved to control plane group) 👋
OLM installed operators probably won't be able to specify NetworkPolicy
resources in their bundles according to https://olm.operatorframework.io/docs/tasks/creating-operator-manifests/#packaging-additional-objects-alongside-an-operator
OLMv1 is supposed to GA in 4.18 and will, initially, respect a subset of the existing bundle format. The desire is to eventually support bundling formats that allows operator authors more control as to the contents packaged within their bundle (OLMv1 is going to be unopinionated in what is in the bundles) meaning operator authors will eventually be able to specify NetworkPolicy
resources in their bundles.
With the current bundling format it seems like OLM would need to be updated to either:
- Allow
NetworkPolicy
resources in the bundle alongside the CSV. IIUC, to achieve compliance with the high-level goal of this proposal this would likely mean updating all bundles in the catalogs to contain a sufficientNetworkPolicy
resource. Some open questions for this approach:- Do we have enough control of the catalog curation pipelines to ensure this is done properly?
- Should a wildcard
NetworkPolicy
be added to all bundles by default if the authors of the operator can't/won't add one themselves?
- Stamp out a permissive
NetworkPolicy
resource on each operator installation.
With the focus on OLMv1 and some general attrition, there are not many folks on the OLM team left with knowledge of the legacy "OLMv0". Both of these are likely not trivial asks of the OLM team.
I do wonder if having CVO manage a default NetworkPolicy
for the openshift-operators
namespace, that is permissive enough for all operators installed in that namespace, is sufficient enough to achieve the compliance we are looking for even if admins can install an operator in any namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
Operators running in the openshift-operators
namespace may need network access to just about anything, so I agree with @knobunc that OLM will likely end up needing to ship with an allow-all policy.
Beyond that, I am pretty leery of any solution that requires operators to change their bundle contents, as this will effectively block the use of older versions of $everything.
I'm not clear on whether there's a mechanism that allows the cluster admin to install/override/configure the NetworkPolicy
in an impacted namespace. I see a knob to shut the whole thing down. And this section hints at it, but I can't tell what actor (cluster admin or developer) or environment (dev/test or production) it's talking about. Anyway, it seems like a per-namespace "escape hatch" would help mitigate this issue, as it would allow operators to publish documentation describing how to retrofit policies for legacy versions.
connections that a namespace uses should make future development of | ||
Admin Network Policy easier. We will be using Admin Network Policy | ||
to allow cluster admins to override our namespace Network Policies | ||
if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is ANP a Non-Goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use ANP as part of the solution, either created by the cluster admin to override, or by our operator with well-known labels that their namespaces can have applied. BUT I don't want to require the components to develop ANP. Partly because I can't work out how to get them added to a cluster... I want the components to manager their own policies and not have a central ANP repositiory. That way the operator can upgrade the policy as needed to ensure that they can seamlessly upgrade. The operator may need to install new policy before an upgrade and then remove the old policy once complete.
|
||
We also need to provide this information to the support organization | ||
in a must-gather. We will augment the must-gather tool to include | ||
information about network policy blocks in OpenShift namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that sufficiently-broken policies could interfere with both the Network Observability tool and must-gather.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can catch at least some of that in automated testing. I am assuming that these are the RH shipped policies, not random ones that a customer applied to us.
|
||
Since nothing needs to use the labels immediately, there is no concern about ordering and racing, as long as the change happens during the upgrade. | ||
|
||
At which point, the cluster administrator can use the new labels to easily apply default policies to all of the OpenShift namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do the new policies get created? How do you ensure there won't be problems when some namespaces have policies and some don't, and some have labels and some don't?
In my earlier proposal (#613) I suggested we'd have to roll this out over several releases to avoid problems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will likely take a while to roll out since each team will need to add policy to their namespaces. And I suspect they will need assistance developing the policy, which is where ACS may be able to help for static analysis of objects and dynamic analysis of trafic and it will suggest policies based on that.
…nents This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.
First round of updates in response to review comments, and some other clean-up.
0244f94
to
25b3070
Compare
@knobunc: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
||
## Proposal | ||
|
||
1. Change the namespace admission controller so all namespaces with an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new, openshift/kubernetes admission plugin is probably a simpler implementation. This will not work on HCP.
Use ValidatingAdmissionPolicy to prevent the removal of the label on openshift-*
namespaces.
3. Work out how we can identify connections blocked by policy in | ||
OpenShift namespaces (either egress or ingress) and work out how to | ||
include it in a must-gather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical feature for development and field debugging. Very glad to see it here.
4. If it does not start with `openshift-` then any | ||
`security.openshift.io/openshift-namespace` label will be stripped out and can not be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mutate the input, but use ValidatingAdmissionPolicy to cause creation to fail to prevent this in newer clusters. Having the cluster-network-operator detect a violation of this rule and prevent upgrade until the cluster-admin corrects the problem is a good idea.
3. Otherwise it will strip the `security.openshift.io/openshift-namespace` label | ||
|
||
TODO: Decide if this is the correct behavior... do we want to allow | ||
namespaces to opt-in to being part of the platform. I do not think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our platform namespaces are openshift-*
namespaces. That's how things like monitoring recognize them. To be a part of the platform requires using that prefix.
1. Document what traffic flows (ingress and egress) need to be | ||
allowed for pods in the namespaces | ||
|
||
2. The ACS product can be used to analyze workloads in a cluster to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't deploy ACS in CI. When ACS isn't installed, what options are available?
trying to talk to whom and mint tight network policies to reflect | ||
that model. | ||
|
||
4. The Network Observability tool can be used to detect when a network policy blocked traffic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network observability isn't available by default. Without network observability, how will we identify and resolve issues?
Are there any unique considerations for making this change work with Hypershift? | ||
|
||
No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised there is no impact to having the kube-apiserver run outside the cluster and reach into the cluster to call things like admission webhooks. How about a specific admission webhook and CRD conversion webhook test to confirm that there is no HCP impact?
How does this proposal affect MicroShift? For example, if the proposal | ||
adds configuration options through API resources, should any of those | ||
behaviors also be exposed to MicroShift admins through the | ||
configuration file for MicroShift? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microshift doesn't run the cluster-network-operator. How will upgrades of microshift get the appropriate namespace labels? How will upgrades prevent non-openshift-* namespaces from squatting on the label?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.