Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-33656: Remove ip xfrm state when IPsec is disabled #2372

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

pperiyasamy
Copy link
Member

The ovn-ipsec-host daemonset doesn't remove ipsec state when IPsec for east west traffic is disabled. This might still keep running data plane traffic with ipsec encrypted. So this PR cleans up ipsec state so that east west traffic restored with only geneve header.

@openshift-ci openshift-ci bot requested review from JacobTanenbaum and trozet May 16, 2024 11:10
@pperiyasamy pperiyasamy marked this pull request as draft May 16, 2024 13:03
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2024
@pperiyasamy pperiyasamy force-pushed the remove-ipsec-state branch 3 times, most recently from 05e8f36 to dbebf6c Compare May 21, 2024 13:23
@pperiyasamy pperiyasamy marked this pull request as ready for review May 21, 2024 13:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2024
@pperiyasamy pperiyasamy changed the title Remove ip xfrm state when IPsec is disabled OCPBUGS-33656: Remove ip xfrm state when IPsec is disabled May 21, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 21, 2024
@openshift-ci-robot
Copy link
Contributor

@pperiyasamy: This pull request references Jira Issue OCPBUGS-33656, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The ovn-ipsec-host daemonset doesn't remove ipsec state when IPsec for east west traffic is disabled. This might still keep running data plane traffic with ipsec encrypted. So this PR cleans up ipsec state so that east west traffic restored with only geneve header.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 21, 2024
@openshift-ci openshift-ci bot requested a review from anuragthehatter May 21, 2024 13:25
@pperiyasamy
Copy link
Member Author

/assign @jcaamano @kyrtapz @yuvalk

@pperiyasamy
Copy link
Member Author

/retest

# entries created for ovs. Since it's cleaning up whole entries,
# it may cause slight interruption in case of external ipsec
# configuration.
echo "remove ovs ipsec state if present"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is, this also removes the N-S state, not just ovs
so need to verify OVNIPsecEnable actually means both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it also does ipsec service restart and wouldn't this get ipsec state for NS back into the system ?
Moreover this logic is similar to what ovn-ipsec container does here and same thing happens for container restart scenarios as well.

The OVNIPsecEnable flag is used only to identify if EW IPsec is enabled or not.

The IPsec mode is changed in following order Full -> Disabled -> External, EW ipsec connections in openshift.conf occasionally not cleaned up even after IPsec disabled, then external mode gets back those connections active again. we may delete only openshift.conf file to handle this case, but not sure about the behavior for Full -> External . Hence this change is introduced to handle all the cases.

ip x p flush
rm -f /etc/ipsec.d/openshift.conf || true
# if pluto is running, we need to restart it after the flush
chroot /proc/1/root ipsec restart || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bengal I remember there used to be a NM issue around this, not sure if fixed?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, restarting ipsec while there are active connections created by NM-libreswan would break them. You need to reactivate such connections manually to restore connectivity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt feels very production grade.
if pluto daemon restart, for any reason (error? crash? etc)
all the connections will die and wont even attempt to restart

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in case of crashes that's a problem. Is this issue already filed somewhere? If not please do, so that we can plan how to improve this in NM-libreswan.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2024
# entries created for ovs. Since it's cleaning up whole entries,
# it may cause slight interruption in case of external ipsec
# configuration.
echo "remove ovs ipsec state if present"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another important issue with this, is that AFAIK, it'll create connectivity issues for workloads!

removing the state, forces ipsec off, but if the other node is not updated at exact the same moment, it'll still try to do ipsec (according to it's policy)

I dont think there's a good solution for this besides leaving ipsec on until next reboot

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
@pperiyasamy
Copy link
Member Author

/test e2e-aws-ovn-ipsec-upgrade

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 7, 2024
@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Dec 4, 2024
Copy link
Contributor

openshift-ci bot commented Dec 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, pperiyasamy, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pperiyasamy
Copy link
Member Author

/assign @huiran0826 @anuragthehatter

@pperiyasamy
Copy link
Member Author

The ipsec test failures with e2e-aws-ovn-ipsec-serial lane are known failures, part of it is fixed with PR openshift/origin#28797 (external mode rollout completeness check) and remaining failures are due to bug OCPBUGS-41823, seeing the error from pluto logs.

Dec 04 17:06:59.881622 ip-10-0-1-22 pluto[27549]: "ovn-e0a41e-0-out-1" #49: No IKEv2 connection found with compatible Traffic Selectors
Dec 04 17:06:59.881637 ip-10-0-1-22 pluto[27549]: "ovn-e0a41e-0-out-1" #49: responding to CREATE_CHILD_SA message (ID 2) from 10.0.108.160:500 with encrypted notification TS_UNACCEPTABLE

@huiran0826
Copy link

huiran0826 commented Dec 5, 2024

Pre-merge tested this PR, after changing ipsec mode from full to external. I checked below items

  1. ovn-ipsec-host pods disappeared and ipsec service was not removed from node.
  2. ipsec state/policy and /etc/ipsec.d/openshift.conf was cleaned up
    Regarding upgrade, I did an upgrade from 4.18.0-ec.3 to the PR build, upgrade succeeded
    Before upgrade
    % oc get pods -n openshift-ovn-kubernetes -l app=ovn-ipsec
    NAME READY STATUS RESTARTS AGE
    ovn-ipsec-host-crzgp 1/1 Running 0 15m
    ovn-ipsec-host-fcr2m 1/1 Running 0 15m
    ovn-ipsec-host-gj7n4 1/1 Running 0 15m
    ovn-ipsec-host-h84n2 1/1 Running 0 15m
    ovn-ipsec-host-rwssh 1/1 Running 0 15m
    ovn-ipsec-host-smkdk 1/1 Running 0 15m
    After ugprade
    % oc get pods -n openshift-ovn-kubernetes
    NAME READY STATUS RESTARTS AGE
    ovn-ipsec-host-8d7x5 2/2 Running 0 71m
    ovn-ipsec-host-h6k9s 2/2 Running 0 71m
    ovn-ipsec-host-h9wd6 2/2 Running 0 71m
    ovn-ipsec-host-j8ksn 2/2 Running 0 71m
    ovn-ipsec-host-smxjn 2/2 Running 0 71m
    ovn-ipsec-host-xv8gq 2/2 Running 0 71m

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Dec 5, 2024
@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Dec 5, 2024
@openshift-ci-robot
Copy link
Contributor

@pperiyasamy: This pull request references Jira Issue OCPBUGS-33656, which is invalid:

  • expected the bug to target either version "4.19." or "openshift-4.19.", but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

The ovn-ipsec-host daemonset doesn't remove ipsec state when IPsec for east west traffic is disabled. This might still keep running data plane traffic with ipsec encrypted. So this PR cleans up ipsec state so that east west traffic restored with only geneve header.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pperiyasamy
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 5, 2024
@openshift-ci-robot
Copy link
Contributor

@pperiyasamy: This pull request references Jira Issue OCPBUGS-33656, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huiran0826

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from huiran0826 December 5, 2024 13:45
@pperiyasamy
Copy link
Member Author

/test e2e-metal-ipi-ovn-ipv6-ipsec

@jcaamano
Copy link
Contributor

jcaamano commented Dec 9, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2024
@jcaamano
Copy link
Contributor

jcaamano commented Dec 9, 2024

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Dec 9, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD eff7f1d and 2 for PR HEAD b25b8e5 in total

@pperiyasamy
Copy link
Member Author

/retest

@openshift-merge-bot openshift-merge-bot bot merged commit 95f4792 into openshift:master Dec 10, 2024
31 of 37 checks passed
@openshift-ci-robot
Copy link
Contributor

@pperiyasamy: Jira Issue OCPBUGS-33656: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-33656 has been moved to the MODIFIED state.

In response to this:

The ovn-ipsec-host daemonset doesn't remove ipsec state when IPsec for east west traffic is disabled. This might still keep running data plane traffic with ipsec encrypted. So this PR cleans up ipsec state so that east west traffic restored with only geneve header.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Dec 10, 2024

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-sdn-upgrade dbebf6c link false /test e2e-aws-sdn-upgrade
ci/prow/e2e-aws-live-migration-sdn-ovn-rollback dbebf6c link false /test e2e-aws-live-migration-sdn-ovn-rollback
ci/prow/security b25b8e5 link false /test security
ci/prow/e2e-aws-ovn-ipsec-serial b25b8e5 link false /test e2e-aws-ovn-ipsec-serial
ci/prow/okd-scos-e2e-aws-ovn b25b8e5 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 b25b8e5 link false /test e2e-vsphere-ovn-dualstack-primaryv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-network-operator
This PR has been included in build cluster-network-operator-container-v4.19.0-202412100939.p0.g95f4792.assembly.stream.el9.
All builds following this will include this PR.

@pperiyasamy
Copy link
Member Author

/cherry-pick release-4.18

@openshift-cherrypick-robot

@pperiyasamy: new pull request created: #2592

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet