Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: OpenShift Tests Extension Framework Initial #1676

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

jupierce
Copy link
Contributor

@jupierce jupierce commented Sep 5, 2024

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2024
Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jupierce. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
Comment on lines 119 to 122
##### OpenShift Payload Extension Binaries
For OpenShift payload components contributors can advertise the existence of an extension binary
by adding information (the imagestream tag for the OCP payload component and the path to the binary
within their image) to a simple registry datastructure in github.com/openshift/origin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: could we store this info in the release image? (The kind that shows up in oc adm release info -ojson).

This would let us register things more dynamically, and also supply more metadata like which suites it has. Maybe get rid of the info subcommand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe; would require updates to oc adm release new

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
Copy link
Member

@sosiouxme sosiouxme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly typos, maybe a few bits of substance

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few thoughts, looks quite exciting.


Component authors may choose to reduce the number of tests run for non-default
configuration profiles, focusing only on tests likeliest to fail based on the
configuration change, in order to reduce overall execution time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get an example on this one, I couldn't come up with a use case immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say you want to test a more verbose logging level configuration option in your CRD. You may have hundreds of tests that fully exercise your component in the 'default' configuration, but it is overkill to run them all again simply to verify that debug log statements are being emitted by your pod when verbose logging is enabled.
So you expose an extension configuration for the verbose logging level, and output only one test when asked for a list in that configuration. All that test does is read the component's pod logs and makes sure that it sees debug output.

If you expose a configuration that disables http/1, you might just want to run a test that verifies an http/1 connection is rejected and an http/2 connection is accepted. If you want to test branding, you might just want to verify that the HTML you scrape contains the newly configured name. If you expose a threshold, you might just want to test that a single expected alert is firing after configuring it.

I can add these to the doc if you buy the premise.

# If applying the configuration implies a disruption, inform
# openshift-tests, so that it can be accounted for in overall
# disruption reporting.
"disruption": "1m",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me a little nervous if it's happening in other repos and we didn't have good visibility into people abusing this because a problem popped up. Probably not a core concern for this enhancement though.

# testing logic. This allows component readiness
# to display the human-readable version of the test
# name while considering test runs across name changes.
"originalName": "security version compliance",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like baking this into the component repos, might get us more renames as very very few people go to the mapping repo.

It would be neat if something could comment on PRs where we see added+removed tests that "if this was a rename, please do ...". That would be relatively often a rename. Problem for another day though.

# before the test was run.
"component": "default",
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment is a little unusual in the results for every test, that seems like mostly a characteristic of a job and a lot of duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are fixed aspects of the environment, like GCP vs AWS, but the enhancement suggests that environment include component configuration information -- a single job being able to apply and test multiple different configuration options. Imagine an operator being able to cycle through several of its typical configurations, running the same tests or tests specific to those configurations, during the execution of a single job. That configuration is relevant to the outcome of a test and Component Readiness must be able to differentiate the same test name running in one configuration vs another.

A next question would be why we would store those static environmental aspects in the aggregated results file alongside each test. My hope there is that the results files can begin to stand alone. You can just push the file content into a database and you know everything you need to know from the resulting DB. You don't need parse prowjob job names, for example, to derive additional context about how the test was run. Many tools can ingest a comprehensive file like this directly, so our options for analysis expand. Imagine wanting to move to a new database or use local tooling to analyze the data. With a comprehensive file format, we just ingest the file into our target analysis tool -- no custom logic like what we have in the cloud function required to pull bits and pieces from multiple artifacts.

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
Comment on lines +588 to +589
Note that `run-test` will blindly execute tests in the list as quickly as possible,
in parallel, without consideration for system resources or parallelism constraints
Copy link
Member

@stbenjam stbenjam Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, OTE doesn't do parallel execution yet. There's some oddities about how we invoke ginkgo tests today. I think I just haven't found all the things I need mutexes for yet.

I assume that's why origin shells out to execute every test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can resolve this one, OTE executes in parallel. Ginkgo has a mutex to force serial execution but other frameworks would be parallelized.

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
@jupierce jupierce force-pushed the openshift-tests-extension branch 2 times, most recently from 402509b to cdb1d75 Compare October 4, 2024 13:54
Copy link
Member

@stbenjam stbenjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments. The changes to details are good

enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
enhancements/testing/openshift-tests-extension.md Outdated Show resolved Hide resolved
Comment on lines +562 to +567
# If a test name is updated at any time in the future,
# originalName must report the original name of the
# testing logic. This allows component readiness
# to display the human-readable version of the test
# name while considering test runs across name changes.
"originalName": "security version compliance",
Copy link
Member

@stbenjam stbenjam Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current version I have this as otherNames so we could have the full history of a test's name.

originalName could work, but it must be included in the ExtensionTestResult, and have this data make its way to a column in the junit table. We'd then need to update ci-test-mapping to look at this column when considering the test ID, which should work fine for both the old way (a rename map in the ci-test-mapping repo) and the extension way (stable original name).

I wouldn't expect anyone including component readiness to group by otherNames (or even originalName), but rather on the test ID from a join on the ci-test-mapping table. We need to be backwards compatible with the universe today, and previous openshift releases without extension test binaries, which means continuing to use the mapping table.

Bigquery requires UNNEST for arrays/repeated records. By
storing environment as a JSON object, we can use JSON_EXTRACT_SCALAR
efficiently on maps with unique keys.
@jupierce jupierce force-pushed the openshift-tests-extension branch from 76ccbf2 to cc85139 Compare October 25, 2024 17:36
Copy link
Contributor

openshift-ci bot commented Oct 25, 2024

@jupierce: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint c2c0d43 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 23, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 30, 2024
@stbenjam
Copy link
Member

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 30, 2024
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 5, 2025
}
```

### Risks and Mitigations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all tests in openshift/origin, we need to bump k8s.io/kubernetes with many useful e2e functions only there. When we move the tests to individual repos, all these repos will endure pain with updating k8s.io/kubernetes. Upstream does not even pretend it has a stable API and it often breaks.

Alternatively, we would need to spend some non-trivial time rewriting the tests not to depend on k8s.io/kubernetes/test/e2e/framework.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to have a discussion about your particular use case? I might be missing some context. What e2e functions are you using from k8s.io/kubernetes/test/e2e/framework? Which tests are you looking to move to external binaries for which repos?

  • Our next OTE customer after migrating k8s-tests-ext is ovn-kubernetes, and the ginkgo tests already exist in their repos to run them. We'd mostly be running them unmodified from upstream

  • The other use case we looked at was moving QE's openshift-tests-private to the component repos, and those don't use anything from k8s.io/kubernetes/test/e2e/framework.

Comment on lines +204 to +206
Optional Operator authors must ensure that the image carrying the extension binary
is identified in their ClusterServiceVersion (CSV) so that tools like `oc-mirror`
will copy image(s) bearing extension binaries to disconnected clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also must ensure that all images used by the extension e2e tests are mirrored by ./openshift-tests images --upstream --to-repository=xyz.

I could be wrong here, I think this list of images seems to be currently hardwired in openshift-tests binary during build. There is already some wording about adding new images to the list here, it will need to be either much stricter or we would need to get the list of images from extension binaries too.

Copy link
Member

@stbenjam stbenjam Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Justin and I discussed this at one point, but I guess the outcome didn't end up in the enhancement. I think the outcome of that was new images would just need a separate PR to origin to add it. It's low frequency enough that it shouldn't be too disruptive. Most tests just use one of a handful of images (agnhost, tools, cli).

We need to solve the helper problem, though.

Comment on lines +204 to +206
Optional Operator authors must ensure that the image carrying the extension binary
is identified in their ClusterServiceVersion (CSV) so that tools like `oc-mirror`
will copy image(s) bearing extension binaries to disconnected clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(starting a separate thread). To get the name of a mirrored image, a test is suggested to call github.com/openshift/origin/test/extended/util/image.LocationFor("my.source/image/location:versioned_tag") here. Does the extension need to import openshift/origin? That could be problematic, especially if the extension imports incompatible version of k8s.io/kubernetes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we'll need to solve for sure. We want to minimize vendoring requirements in the component repos for sure. I do not want anyone to have to vendor origin code.

Perhaps we could pass the image locations to each tests binary and have OTE provide a LocationFor helper. It could be passed either as a CLI flag or a path to a JSON/YAML file.

@jupierce
Copy link
Contributor Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 13, 2025
Comment on lines +76 to +78
colocated with the features they are testing. It defines a standardized interface
for test discovery, execution, and result aggregation, allowing decentralized
contributions while maintaining centralized orchestration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing decentralized contributions while maintaining centralized orchestration.

Is there any impact in how the new tests are added to the conformance suite, such as openshift/conformance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants