-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-1277: status: Add Gateway API objects to relatedObjects #933
NE-1277: status: Add Gateway API objects to relatedObjects #933
Conversation
@Miciah: This pull request references NE-1277 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
@Miciah: This pull request references NE-1277 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@Miciah: This pull request references NE-1277 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@Miciah: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6246acc
to
78c31dd
Compare
https://github.com/openshift/cluster-ingress-operator/compare/6246acc7f400ebe0376589efa793d8033c5a8683..78c31dd241a7d0e8a579a4d1f8b870d569c1c96d rebases onto #1152 in order to get the Gateway API v1 and Sail Operator v1 CRDs. Some other changes were made redundant by #1115 and are thus dropped from this PR. |
/test e2e-aws-operator |
78c31dd
to
459761a
Compare
Rebased onto #1152. |
/assign |
Namespace: subscriptionName.Namespace, | ||
Name: subscriptionName.Name, | ||
}) | ||
if state.IngressNamespace != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a gateway to be installed outside the IngressNamespace and still be a related object?
Also, how about HTTPRoutes, GRPCRoutes, and ReferenceGrants?
Is there a way to make sure the pod that runs conformance tests ends up as a related object so we can check the logs in artifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a gateway to be installed outside the IngressNamespace and still be a related object?
I don't think it would be appropriate to include objects that constitute user workload, and for component-owned objects, the precedent is that the component that owns the object specifies it in its relatedObjects
field. For example, the authentication operator specifies its route in the authentication clusteroperator's relatedObjects
field: https://github.com/openshift/cluster-authentication-operator/blob/11b2201203fd5ba0dec937015fa542f4cd4c8879/pkg/operator/starter.go#L566
Also, how about HTTPRoutes, GRPCRoutes, and ReferenceGrants?
We don't have any. Again, we wouldn't want to include other components' objects or objects related to user workload. This is analogous to how the ingress clusteroperator specifies ingresscontrollers but not routes in relatedObjects
. (Come to think of it, maybe we ought to specify the canary route. I can file an issue for that if you agree.)
Is there a way to make sure the pod that runs conformance tests ends up as a related object so we can check the logs in artifacts?
I think that that would be misuse of relatedObjects
. Aside from that, the must-gather artifacts can only include objects that exist when must-gather runs, and CI runs must-gather after the tests have finished, so unless the conformance tests leak objects, must-gather wouldn't collect them anyway.
/test e2e-aws-gatewayapi-conformance |
bef05ce
to
86f0c17
Compare
https://github.com/openshift/cluster-ingress-operator/compare/bef05ce144a269f1a170e1e8c22e53a769a5128b..86f0c17f610eee033337b3710a0e728a02319db6 adds "dnsrecords" in the "openshift-ingress" namespace to the expected |
86f0c17
to
77beec1
Compare
https://github.com/openshift/cluster-ingress-operator/compare/86f0c17f610eee033337b3710a0e728a02319db6..77beec1a8b6af8c8c5ce00f29f7a60683a84cd75 adds some more logging to help diagnose test failures. |
/test e2e-aws-operator-techpreview |
@@ -36,65 +36,66 @@ import ( | |||
func TestCanaryRoute(t *testing.T) { | |||
kubeConfig, err := config.GetConfig() | |||
if err != nil { | |||
t.Fatalf("failed to get kube config: %v", err) | |||
t.Fatalf("Failed to get kube config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any more test changes are needed, could you please put them in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean additional changes to TestCanaryRoute
or to other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split the other test fixes off into separate PRs and Jira issues if you prefer. I've just been adding them as commits in this PR as I diagnose CI failures that impact this PR.
This commit resolves NE-1277. https://issues.redhat.com/browse/NE-1277 * pkg/operator/controller/status/controller.go (New): Add a watch for subscriptions so that the status controller updates relatedObjects with the OSSM subscription when it is added or removed. (Config): Add GatewayAPIEnabled field. (Reconcile): Add dnsrecords in the operand namespace to relatedObjects. If GatewayAPIEnabled is true, add resources related to Gateway API. If haveOSSMSubscription is true, add resources that require the subscription. (operatorState): Add haveOSSMSubscription field. (getOperatorState): Set haveOSSMSubscription. * pkg/operator/operator.go (New): Specify GatewayAPIEnabled in the status controller's config. * test/e2e/operator_test.go (TestClusterOperatorStatusRelatedObjects): Expect to observe "dnsrecords" for the "openshift-ingress" namespace as well as "gatewayclasses" in relatedObjects.
* test/e2e/canary_test.go (TestCanaryRoute): Add some more logging to make it easier to diagnose test failures.
{ | ||
Group: iov1.GroupVersion.Group, | ||
Resource: "dnsrecords", | ||
Namespace: "openshift-ingress", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This test runs before TestGatewayAPI, so we do *not* expect | ||
// to see subscriptions, istios, or gateways in relatedObjects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this runs before TestGatewayAPI, what created the gatewayClass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gatewayapi controller creates the CRD if the featuregate is enabled. We only need the CRD to exist in order to add the gatewayclasses resource to relatedObjects
; we don't need any CR to exist. (However, if TestGatewayAPI
does run as part of CI, then having "gatewayclasses" in relatedObjects
will cause the CI job to gather the gatewayclass that TestGatewayAPI
created when the CI job runs the must-gather step, after the test step.)
test/e2e/util_gatewayapi_test.go
Outdated
@@ -562,14 +563,14 @@ func assertGatewaySuccessful(t *testing.T, namespace, name string) (*gatewayapiv | |||
|
|||
err := wait.PollUntilContextTimeout(context.Background(), 1*time.Second, 1*time.Minute, false, func(context context.Context) (bool, error) { | |||
if err := kclient.Get(context, nsName, gw); err != nil { | |||
t.Logf("failed to get gateway %s, retrying...", name) | |||
t.Logf("Failed to get gateway %v: %v", nsName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
t.Logf("Failed to get gateway %v: %v", nsName, err) | |
t.Logf("Failed to get gateway %v: %v, retrying...", nsName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/e2e/util_gatewayapi_test.go
Outdated
@@ -598,15 +601,15 @@ func assertHttpRouteSuccessful(t *testing.T, namespace, name string, gateway *ga | |||
// Wait 1 minute for parent/s to update | |||
err := wait.PollUntilContextTimeout(context.Background(), 1*time.Second, 1*time.Minute, false, func(context context.Context) (bool, error) { | |||
if err := kclient.Get(context, nsName, httproute); err != nil { | |||
t.Logf("failed to get httproute %s/%s, retrying...", namespace, name) | |||
t.Logf("Failed to get httproute %v: %v", nsName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more cases here that had the "retrying" before. Also, I expect that the error message is not going to change much on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back "retrying" in https://github.com/openshift/cluster-ingress-operator/compare/77beec1a8b6af8c8c5ce00f29f7a60683a84cd75..92e89a2a02fccb3d149e3e0491a30f19fe160774.
I don't think it's worth the effort to save and compare the last err.String()
to make logging slightly more compact. If it is worth that effort, then we should probably write a getWithRetry
helper and update all tests to use it.
test/e2e/util_gatewayapi_test.go
Outdated
if err := assertDNSRecord(t, dnsRecordName); err != nil { | ||
return err | ||
} | ||
|
||
// Wait and check that the dns name resolves first. Takes a long time, so | ||
// if the hostname is actually an IP address, skip this. | ||
t.Logf("Attempting to resolve %s...", hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise, it won't attempt to resolve the hostname if it is actually an IP address. This log message is in accurate in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test/e2e/util_gatewayapi_test.go
Outdated
@@ -843,20 +846,21 @@ func assertDNSRecord(t *testing.T, recordName types.NamespacedName) error { | |||
|
|||
err := wait.PollUntilContextTimeout(context.Background(), 1*time.Second, 1*time.Minute, false, func(context context.Context) (bool, error) { | |||
if err := kclient.Get(context, recordName, dnsRecord); err != nil { | |||
t.Logf("failed to get DNSRecord %s/%s: %v, retrying...", recordName.Namespace, recordName.Name, err) | |||
t.Logf("Failed to get DNSRecord %v: %v", recordName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again a case where "retrying" in the log message is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more logging to functions that testGatewayAPIObjects uses in order to make it easier to diagnose test failures. * test/e2e/gateway_api_test.go (ensureGatewayObjectSuccess): Remove t.Helper() and add more logging. This function has core test logic that can mark the test as failed, and marking this function as a helper obfuscates the point of failure. * test/e2e/util_gatewayapi_test.go (assertGatewayClassSuccessful): Add more logging. (assertGatewaySuccessful): Add more logging. (assertHttpRouteSuccessful): Add more logging. (assertHttpRouteConnection): Remove t.Helper() and add more logging. (assertDNSRecord): Add more logging.
77beec1
to
92e89a2
Compare
https://github.com/openshift/cluster-ingress-operator/compare/77beec1a8b6af8c8c5ce00f29f7a60683a84cd75..92e89a2a02fccb3d149e3e0491a30f19fe160774 moves an import and makes some log messages more consistent. |
Some earlier runs of e2e-aws-operator and e2e-aws-operator-techpreview failed on |
Many CI errors earlier today were caused by a Marketplace outage. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: candita The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
@Miciah: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
hypershift-e2e-aks failed because
/test hypershift-e2e-aks |
81e314f
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-ingress-operator |
pkg/operator/controller/names.go
(GlobalOperatorsNamespace
): New const for the "openshift-operators" namespace.(
ServiceMeshSubscriptionName
): UseGlobalOperatorsNamespace
.pkg/operator/controller/status/controller.go
(New
): Add a watch for subscriptions so that the status controller updatesrelatedObjects
when the OSSM subscription is added or removed.(
Config
): AddGatewayAPIEnabled
field.(
Reconcile
): IfGatewayAPIEnabled
is true, add resources related to Gateway API torelatedObjects
. IfhaveOSSMSubscription
is true, add resources that require the subscription.(
operatorState
): AddhaveOSSMSubscription
field.(
getOperatorState
): SethaveOSSMSubscription
.pkg/operator/operator.go
(New
): Watch the "openshift-operators" namespace, and specifyGatewayAPIEnabled
in the status controller's config.