WIP Enhancement to outline path for network policies for all core components #1720

knobunc · 2024-11-26T17:38:26Z

This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

danwinship · 2024-12-02T16:51:11Z

enhancements/network/core-network-policies.md

+- As an administrator, I need to be able to override specific
+  OpenShift policies to be more restrictive so that I can satisfy my
+  security department


For the most part, this should not be necessary. If a given restriction is safe to apply, then it should be part of the default policy. And in that case, anything "more restrictive" would be something that would cause cluster functionality to break in some way. IMO we shouldn't allow admins to break functionality they aren't using, we should allow them to disable that functionality instead (and then they don't need to add NetworkPolicies restricting its use).

One specific case I can think of where this might apply is admission control webhooks / validation webhooks. The apiserver needs to reach out to any namespace that contains such a webhook, so by default, we'd have to allow the apiserver egress access to all namespaces (or at least, all "system" namespaces?). Administrators might want to disable that. But should that be expressed as "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that", or should we instead have kube-apiserver-operator just automatically add NPs for all and only the namespaces that contain webhooks? Or should we have a rule like "webhooks can only exist in namespaces with the label foo"?

(Also, FTR, there is no way to implement "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that" given the semantics of NetworkPolicy; we would have to say that administrators are allowed to delete the "kube-apiserver can reach all namespaces" NetworkPolicy.)

I want administrators to create AdminNetworkPolicy to express their exceptions (where they can deny).

Ultimately, I would love for operators to have a way to have config that allows them to create policy to restrict things like webhooks. They may need an allow-list of valid domains, or something.

But, for the shorter term, we already have people deploying their own Network Policy (not admin, it wasn't ready then) to restrict our stuff since we have no policy today.

enhancements/network/core-network-policies.md

danwinship · 2024-12-02T16:58:14Z

enhancements/network/core-network-policies.md

+1. Change the namespace admission controller so all namespaces with an
+   openshift- prefix (since those are special anyway) are labeled with
+   the “openshift-namespace’ label so **that network policy can address
+   them**


I always forget what the rule about OLM operators and openshift- is, so maybe you should explain that somewhere. (In particular, it seems from some of the other comments that OLM operators can make openshift- namespaces, but then how does that work with "(Not) Adding policies for operators not created by Red Hat"?)

The rule is horrible.

They get installed into a default openshift-operators namespace, but the OLM will need to own the policy for that namespace. I suspect it will need to be wildcard allow egress and ingress.

But, worse, the admin can chose any namespace to add it to... there are ongoing conversations about how to deal with this.

Former OLM team member here (recently moved to control plane group) 👋

OLM installed operators probably won't be able to specify NetworkPolicy resources in their bundles according to https://olm.operatorframework.io/docs/tasks/creating-operator-manifests/#packaging-additional-objects-alongside-an-operator

OLMv1 is supposed to GA in 4.18 and will, initially, respect a subset of the existing bundle format. The desire is to eventually support bundling formats that allows operator authors more control as to the contents packaged within their bundle (OLMv1 is going to be unopinionated in what is in the bundles) meaning operator authors will eventually be able to specify NetworkPolicy resources in their bundles.

With the current bundling format it seems like OLM would need to be updated to either:

Allow NetworkPolicy resources in the bundle alongside the CSV. IIUC, to achieve compliance with the high-level goal of this proposal this would likely mean updating all bundles in the catalogs to contain a sufficient NetworkPolicy resource. Some open questions for this approach:

Do we have enough control of the catalog curation pipelines to ensure this is done properly?

Should a wildcard NetworkPolicy be added to all bundles by default if the authors of the operator can't/won't add one themselves?

Stamp out a permissive NetworkPolicy resource on each operator installation.

With the focus on OLMv1 and some general attrition, there are not many folks on the OLM team left with knowledge of the legacy "OLMv0". Both of these are likely not trivial asks of the OLM team.

I do wonder if having CVO manage a default NetworkPolicy for the openshift-operators namespace, that is permissive enough for all operators installed in that namespace, is sufficient enough to achieve the compliance we are looking for even if admins can install an operator in any namespace?

👀

Operators running in the openshift-operators namespace may need network access to just about anything, so I agree with @knobunc that OLM will likely end up needing to ship with an allow-all policy.

Beyond that, I am pretty leery of any solution that requires operators to change their bundle contents, as this will effectively block the use of older versions of $everything.

I'm not clear on whether there's a mechanism that allows the cluster admin to install/override/configure the NetworkPolicy in an impacted namespace. I see a knob to shut the whole thing down. And this section hints at it, but I can't tell what actor (cluster admin or developer) or environment (dev/test or production) it's talking about. Anyway, it seems like a per-namespace "escape hatch" would help mitigate this issue, as it would allow operators to publish documentation describing how to retrofit policies for legacy versions.

enhancements/network/core-network-policies.md

danwinship · 2024-12-02T17:00:17Z

enhancements/network/core-network-policies.md

+  connections that a namespace uses should make future development of
+  Admin Network Policy easier.  We will be using Admin Network Policy
+  to allow cluster admins to override our namespace Network Policies
+  if needed.


why is ANP a Non-Goal?

We can use ANP as part of the solution, either created by the cluster admin to override, or by our operator with well-known labels that their namespaces can have applied. BUT I don't want to require the components to develop ANP. Partly because I can't work out how to get them added to a cluster... I want the components to manager their own policies and not have a central ANP repositiory. That way the operator can upgrade the policy as needed to ensure that they can seamlessly upgrade. The operator may need to install new policy before an upgrade and then remove the old policy once complete.

enhancements/network/core-network-policies.md

danwinship · 2024-12-02T17:14:04Z

enhancements/network/core-network-policies.md

+
+We also need to provide this information to the support organization
+in a must-gather.  We will augment the must-gather tool to include
+information about network policy blocks in OpenShift namespaces.


Note that sufficiently-broken policies could interfere with both the Network Observability tool and must-gather.

Hopefully we can catch at least some of that in automated testing. I am assuming that these are the RH shipped policies, not random ones that a customer applied to us.

danwinship · 2024-12-02T17:16:19Z

enhancements/network/core-network-policies.md

+
+Since nothing needs to use the labels immediately, there is no concern about ordering and racing, as long as the change happens during the upgrade.
+
+At which point, the cluster administrator can use the new labels to easily apply default policies to all of the OpenShift namespaces.	


When do the new policies get created? How do you ensure there won't be problems when some namespaces have policies and some don't, and some have labels and some don't?

In my earlier proposal (#613) I suggested we'd have to roll this out over several releases to avoid problems...

This will likely take a while to roll out since each team will need to add policy to their namespaces. And I suspect they will need assistance developing the policy, which is where ACS may be able to help for static analysis of objects and dynamic analysis of trafic and it will suggest policies based on that.

…nents This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

First round of updates in response to review comments, and some other clean-up.

openshift-ci · 2024-12-03T19:28:06Z

@knobunc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`25b3070`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2025-01-01T01:15:07Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

deads2k · 2025-01-07T15:49:23Z

enhancements/network/core-network-policies.md

+
+## Proposal
+
+1. Change the namespace admission controller so all namespaces with an


A new, openshift/kubernetes admission plugin is probably a simpler implementation. This will not work on HCP.

Use ValidatingAdmissionPolicy to prevent the removal of the label on openshift-* namespaces.

deads2k · 2025-01-07T15:50:38Z

enhancements/network/core-network-policies.md

+3. Work out how we can identify connections blocked by policy in
+   OpenShift namespaces (either egress or ingress) and work out how to
+   include it in a must-gather


Critical feature for development and field debugging. Very glad to see it here.

deads2k · 2025-01-07T15:52:33Z

enhancements/network/core-network-policies.md

+4.  If it does not start with `openshift-` then any
+    `security.openshift.io/openshift-namespace` label will be stripped out and can not be set


Don't mutate the input, but use ValidatingAdmissionPolicy to cause creation to fail to prevent this in newer clusters. Having the cluster-network-operator detect a violation of this rule and prevent upgrade until the cluster-admin corrects the problem is a good idea.

deads2k · 2025-01-07T15:53:50Z

enhancements/network/core-network-policies.md

+   3. Otherwise it will strip the `security.openshift.io/openshift-namespace` label
+
+   TODO: Decide if this is the correct behavior... do we want to allow
+   namespaces to opt-in to being part of the platform.  I do not think


Our platform namespaces are openshift-* namespaces. That's how things like monitoring recognize them. To be a part of the platform requires using that prefix.

deads2k · 2025-01-07T15:56:01Z

enhancements/network/core-network-policies.md

+    1. Document what traffic flows (ingress and egress) need to be
+       allowed for pods in the namespaces
+
+    2. The ACS product can be used to analyze workloads in a cluster to


We don't deploy ACS in CI. When ACS isn't installed, what options are available?

deads2k · 2025-01-07T15:56:36Z

enhancements/network/core-network-policies.md

+       trying to talk to whom and mint tight network policies to reflect
+       that model.
+
+    4. The Network Observability tool can be used to detect when a network policy blocked traffic


Network observability isn't available by default. Without network observability, how will we identify and resolve issues?

deads2k · 2025-01-07T15:58:08Z

enhancements/network/core-network-policies.md

+Are there any unique considerations for making this change work with Hypershift?
+
+No. 


I'm surprised there is no impact to having the kube-apiserver run outside the cluster and reach into the cluster to call things like admission webhooks. How about a specific admission webhook and CRD conversion webhook test to confirm that there is no HCP impact?

deads2k · 2025-01-07T16:00:03Z

enhancements/network/core-network-policies.md

+How does this proposal affect MicroShift? For example, if the proposal
+adds configuration options through API resources, should any of those
+behaviors also be exposed to MicroShift admins through the
+configuration file for MicroShift?


Microshift doesn't run the cluster-network-operator. How will upgrades of microshift get the appropriate namespace labels? How will upgrades prevent non-openshift-* namespaces from squatting on the label?

openshift-ci · 2025-01-07T16:04:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from knobunc. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/network/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2025-01-15T00:45:11Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci bot requested review from danwinship and trozet November 26, 2024 17:38

knobunc changed the title ~~Enhancement to outline path for network policies for all core components~~ WIP Enhancement to outline path for network policies for all core components Nov 26, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024

danwinship reviewed Dec 2, 2024

View reviewed changes

knobunc added 2 commits December 3, 2024 13:53

Enhancement to outline how we get network policies for all core compo…

c185b85

…nents This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

Responded to review comments

25b3070

First round of updates in response to review comments, and some other clean-up.

knobunc force-pushed the core-network-policies branch from 0244f94 to 25b3070 Compare December 3, 2024 18:57

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2025

deads2k requested changes Jan 7, 2025

View reviewed changes

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Enhancement to outline path for network policies for all core components #1720

WIP Enhancement to outline path for network policies for all core components #1720

knobunc commented Nov 26, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

everettraven Dec 3, 2024

2uasimojo Jan 7, 2025

danwinship Dec 2, 2024

knobunc Dec 3, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

openshift-ci bot commented Dec 3, 2024

openshift-bot commented Jan 1, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

openshift-ci bot commented Jan 7, 2025

openshift-bot commented Jan 15, 2025


		Since nothing needs to use the labels immediately, there is no concern about ordering and racing, as long as the change happens during the upgrade.

		At which point, the cluster administrator can use the new labels to easily apply default policies to all of the OpenShift namespaces.


		## Proposal

		1. Change the namespace admission controller so all namespaces with an

		4. If it does not start with `openshift-` then any
		`security.openshift.io/openshift-namespace` label will be stripped out and can not be set

		Are there any unique considerations for making this change work with Hypershift?

		No.

WIP Enhancement to outline path for network policies for all core components #1720

Are you sure you want to change the base?

WIP Enhancement to outline path for network policies for all core components #1720

Conversation

knobunc commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 3, 2024

openshift-bot commented Jan 1, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 7, 2025

openshift-bot commented Jan 15, 2025