-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: OpenShift Tests Extension Framework Initial #1676
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
##### OpenShift Payload Extension Binaries | ||
For OpenShift payload components contributors can advertise the existence of an extension binary | ||
by adding information (the imagestream tag for the OCP payload component and the path to the binary | ||
within their image) to a simple registry datastructure in github.com/openshift/origin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative: could we store this info in the release image? (The kind that shows up in oc adm release info -ojson
).
This would let us register things more dynamically, and also supply more metadata like which suites it has. Maybe get rid of the info
subcommand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe; would require updates to oc adm release new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly typos, maybe a few bits of substance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts, looks quite exciting.
|
||
Component authors may choose to reduce the number of tests run for non-default | ||
configuration profiles, focusing only on tests likeliest to fail based on the | ||
configuration change, in order to reduce overall execution time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get an example on this one, I couldn't come up with a use case immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say you want to test a more verbose logging level configuration option in your CRD. You may have hundreds of tests that fully exercise your component in the 'default' configuration, but it is overkill to run them all again simply to verify that debug log statements are being emitted by your pod when verbose logging is enabled.
So you expose an extension configuration for the verbose logging level, and output only one test when asked for a list in that configuration. All that test does is read the component's pod logs and makes sure that it sees debug output.
If you expose a configuration that disables http/1, you might just want to run a test that verifies an http/1 connection is rejected and an http/2 connection is accepted. If you want to test branding, you might just want to verify that the HTML you scrape contains the newly configured name. If you expose a threshold, you might just want to test that a single expected alert is firing after configuring it.
I can add these to the doc if you buy the premise.
# If applying the configuration implies a disruption, inform | ||
# openshift-tests, so that it can be accounted for in overall | ||
# disruption reporting. | ||
"disruption": "1m", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me a little nervous if it's happening in other repos and we didn't have good visibility into people abusing this because a problem popped up. Probably not a core concern for this enhancement though.
# testing logic. This allows component readiness | ||
# to display the human-readable version of the test | ||
# name while considering test runs across name changes. | ||
"originalName": "security version compliance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like baking this into the component repos, might get us more renames as very very few people go to the mapping repo.
It would be neat if something could comment on PRs where we see added+removed tests that "if this was a rename, please do ...". That would be relatively often a rename. Problem for another day though.
# before the test was run. | ||
"component": "default", | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment is a little unusual in the results for every test, that seems like mostly a characteristic of a job and a lot of duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are fixed aspects of the environment, like GCP vs AWS, but the enhancement suggests that environment include component configuration information -- a single job being able to apply and test multiple different configuration options. Imagine an operator being able to cycle through several of its typical configurations, running the same tests or tests specific to those configurations, during the execution of a single job. That configuration is relevant to the outcome of a test and Component Readiness must be able to differentiate the same test name running in one configuration vs another.
A next question would be why we would store those static environmental aspects in the aggregated results file alongside each test. My hope there is that the results files can begin to stand alone. You can just push the file content into a database and you know everything you need to know from the resulting DB. You don't need parse prowjob job names, for example, to derive additional context about how the test was run. Many tools can ingest a comprehensive file like this directly, so our options for analysis expand. Imagine wanting to move to a new database or use local tooling to analyze the data. With a comprehensive file format, we just ingest the file into our target analysis tool -- no custom logic like what we have in the cloud function required to pull bits and pieces from multiple artifacts.
Note that `run-test` will blindly execute tests in the list as quickly as possible, | ||
in parallel, without consideration for system resources or parallelism constraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, OTE doesn't do parallel execution yet. There's some oddities about how we invoke ginkgo tests today. I think I just haven't found all the things I need mutexes for yet.
I assume that's why origin shells out to execute every test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can resolve this one, OTE executes in parallel. Ginkgo has a mutex to force serial execution but other frameworks would be parallelized.
402509b
to
cdb1d75
Compare
cdb1d75
to
ac860f4
Compare
Update the doc based on my comments + actual current implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments. The changes to details
are good
# If a test name is updated at any time in the future, | ||
# originalName must report the original name of the | ||
# testing logic. This allows component readiness | ||
# to display the human-readable version of the test | ||
# name while considering test runs across name changes. | ||
"originalName": "security version compliance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current version I have this as otherNames
so we could have the full history of a test's name.
originalName
could work, but it must be included in the ExtensionTestResult
, and have this data make its way to a column in the junit table. We'd then need to update ci-test-mapping to look at this column when considering the test ID, which should work fine for both the old way (a rename map in the ci-test-mapping repo) and the extension way (stable original name).
I wouldn't expect anyone including component readiness to group by otherNames
(or even originalName
), but rather on the test ID from a join on the ci-test-mapping table. We need to be backwards compatible with the universe today, and previous openshift releases without extension test binaries, which means continuing to use the mapping table.
Bigquery requires UNNEST for arrays/repeated records. By storing environment as a JSON object, we can use JSON_EXTRACT_SCALAR efficiently on maps with unique keys.
76ccbf2
to
cc85139
Compare
@jupierce: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
} | ||
``` | ||
|
||
### Risks and Mitigations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all tests in openshift/origin
, we need to bump k8s.io/kubernetes with many useful e2e functions only there. When we move the tests to individual repos, all these repos will endure pain with updating k8s.io/kubernetes. Upstream does not even pretend it has a stable API and it often breaks.
Alternatively, we would need to spend some non-trivial time rewriting the tests not to depend on k8s.io/kubernetes/test/e2e/framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to have a discussion about your particular use case? I might be missing some context. What e2e functions are you using from k8s.io/kubernetes/test/e2e/framework? Which tests are you looking to move to external binaries for which repos?
-
Our next OTE customer after migrating k8s-tests-ext is ovn-kubernetes, and the ginkgo tests already exist in their repos to run them. We'd mostly be running them unmodified from upstream
-
The other use case we looked at was moving QE's openshift-tests-private to the component repos, and those don't use anything from k8s.io/kubernetes/test/e2e/framework.
Optional Operator authors must ensure that the image carrying the extension binary | ||
is identified in their ClusterServiceVersion (CSV) so that tools like `oc-mirror` | ||
will copy image(s) bearing extension binaries to disconnected clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also must ensure that all images used by the extension e2e tests are mirrored by ./openshift-tests images --upstream --to-repository=xyz
.
I could be wrong here, I think this list of images seems to be currently hardwired in openshift-tests binary during build. There is already some wording about adding new images to the list here, it will need to be either much stricter or we would need to get the list of images from extension binaries too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Justin and I discussed this at one point, but I guess the outcome didn't end up in the enhancement. I think the outcome of that was new images would just need a separate PR to origin to add it. It's low frequency enough that it shouldn't be too disruptive. Most tests just use one of a handful of images (agnhost, tools, cli).
We need to solve the helper problem, though.
Optional Operator authors must ensure that the image carrying the extension binary | ||
is identified in their ClusterServiceVersion (CSV) so that tools like `oc-mirror` | ||
will copy image(s) bearing extension binaries to disconnected clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(starting a separate thread). To get the name of a mirrored image, a test is suggested to call github.com/openshift/origin/test/extended/util/image.LocationFor("my.source/image/location:versioned_tag")
here. Does the extension need to import openshift/origin
? That could be problematic, especially if the extension imports incompatible version of k8s.io/kubernetes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we'll need to solve for sure. We want to minimize vendoring requirements in the component repos for sure. I do not want anyone to have to vendor origin code.
Perhaps we could pass the image locations to each tests binary and have OTE provide a LocationFor helper. It could be passed either as a CLI flag or a path to a JSON/YAML file.
/remove-lifecycle rotten |
colocated with the features they are testing. It defines a standardized interface | ||
for test discovery, execution, and result aggregation, allowing decentralized | ||
contributions while maintaining centralized orchestration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing decentralized contributions while maintaining centralized orchestration.
Is there any impact in how the new tests are added to the conformance suite, such as openshift/conformance
?
No description provided.