Skip to content

RDMA CNI plugin for containerized workloads

License

Notifications You must be signed in to change notification settings

openshift/rdma-cni

This branch is 19 commits ahead of k8snetworkplumbingwg/rdma-cni:master.

Folders and files

NameName
Last commit message
Last commit date
Mar 26, 2025
May 29, 2024
May 22, 2023
May 19, 2020
Apr 25, 2024
May 28, 2024
Feb 4, 2025
Dec 5, 2024
Feb 27, 2023
May 22, 2023
Oct 7, 2024
May 28, 2023
May 28, 2023
Dec 5, 2024
May 28, 2023
Mar 9, 2020
Aug 28, 2024
Jul 8, 2024
May 30, 2023
Jan 19, 2025
Jan 19, 2025

Repository files navigation

License Go Report Card Build&Tests Coverage Status

RDMA CNI plugin

CNI compliant plugin for network namespace aware RDMA interfaces.

RDMA CNI plugin allows network namespace isolation for RDMA workloads in a containerized environment.

Overview

RDMA CNI plugin is intended to be run as a chained CNI plugin (introduced in CNI Specifications v0.3.0). It ensures isolation of RDMA traffic from other workloads in the system by moving the associated RDMA interfaces of the provided network interface to the container's network namespace path.

The main use-case (for now...) is for containerized SR-IOV workloads orchestrated by Kubernetes that perform RDMA and wish to leverage network namespace isolation of RDMA devices introduced in linux kernel 5.3.0.

Requirements

Hardware

SR-IOV capable NIC which supports RDMA.

Supported Hardware

Mellanox Network adapters

ConnectX®-4 and above

Operating System

Linux distribution

Kernel

Kernel based on 5.3.0 or newer, RDMA modules loaded in the system. rdma-core package provides means to automatically load relevant modules on system start.

Note: For deployments that use Mellanox out-of-tree driver (Mellanox OFED), Mellanox OFED version 4.7 or newer is required. In this case it is not required to use a Kernel based on 5.3.0 or newer.

Pacakges

iproute2 package based on kernel 5.3.0 or newer installed on the system.

Note: It is recommended that the required packages are installed by your system's package manager.

Note: For deployments using Mellanox OFED, iproute2 package is bundled with the driver under /opt/mellanox/iproute2/

Deployment requirements (Kubernetes)

Please refer to the relevant link on how to deploy each component. For a Kubernetes deployment, each SR-IOV capable worker node should have:

Note:: Kubernetes version 1.16 or newer is required for deploying as daemonset

RDMA CNI configurations

{
  "cniVersion": "0.3.1",
  "type": "rdma",
  "args": {
    "cni": {
      "debug": true
    }
  }
}

Note: "args" keyword is optional.

Deployment

System configuration

It is recommended to set RDMA subsystem namespace awareness mode to exclusive on OS boot.

Set RDMA subsystem namespace awareness mode to exclusive via ib_core module parameter:

~$ echo "options ib_core netns_mode=0" >> /etc/modprobe.d/ib_core.conf

Set RDMA subsystem namespace awareness mode to exclusive via rdma tool:

~$ rdma system set netns exclusive

Note: When changing RDMA subsystem netns mode, kernel requires that no network namespaces to exist in the system.

Deploy RDMA CNI

~$ kubectl apply -f ./deployment/rdma-cni-daemonset.yaml

Deploy workload

Pod definition can be found in the example below.

~$ kubectl apply -f ./examples/my_rdma_test_pod.yaml

Pod example:

apiVersion: v1
kind: Pod
metadata:
  name: rdma-test-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-net
spec:
  containers:
    - name: rdma-app
      image: centos/tools
      imagePullPolicy: IfNotPresent
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 300000; done;" ]
      resources:
        requests:
          mellanox.com/sriov_rdma: '1'
        limits:
          mellanox.com/sriov_rdma: '1'

SR-IOV Network Device Plugin ConfigMap example

The following yaml defines an RDMA enabled SR-IOV resource pool named: mellanox.com/sriov_rdma

apiVersion: v1
kind: ConfigMap
metadata:
  name: sriovdp-config
  namespace: kube-system
data:
  config.json: |
    {
      "resourceList": [
        {
           "resourcePrefix": "mellanox.com",
           "resourceName": "sriov_rdma",
           "selectors": {
               "isRdma": true,
               "vendors": ["15b3"],
               "pfNames": ["enp4s0f0"]
           }
        }
      ]
    }

Network CRD example

The following yaml defines a network, sriov-network, associated with an rdma enabled resurce, mellanox.com/sriov_rdma.

The CNI plugins that will be executed in a chain are for Pods that request this network are: sriov, rdma CNIs

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-rdma-net
  annotations:
    k8s.v1.cni.cncf.io/resourceName: mellanox.com/sriov_rdma
spec:
  config: '{
             "cniVersion": "0.3.1",
             "name": "sriov-rdma-net",
             "plugins": [{
                          "type": "sriov",
                          "ipam": {
                            "type": "host-local",
                            "subnet": "10.56.217.0/24",
                            "routes": [{
                              "dst": "0.0.0.0/0"
                            }],
                            "gateway": "10.56.217.1"
                          }
                        },
                        {
                          "type": "rdma"
                        }]
           }'

Development

It is recommended to use the same go version as defined in .travis.yml to avoid potential build related issues during development (newer version will most likely work as well).

Build from source

~$ git clone https://github.com/k8snetworkplumbingwg/rdma-cni.git
~$ cd rdma-cni
~$ make

Upon a successful build, rdma binary can be found under ./build. For small deployments (e.g a kubernetes test cluster/AIO K8s deployment) you can:

  1. Copy rdma binary to the CNI dir in each worker node.
  2. Build container image, push to your own image repo then modify the deployment template and deploy.

Run tests:

~$ make tests

Build image:

~$ make image

Limitations

Ethernet

RDMA workloads utilizing RDMA Connection Manager (CM)

For Mellanox Hardware, due to kernel limitation, it is required to pre-allocate MACs for all VFs in the deployment if an RDMA workload wishes to utilize RMDA CM to establish connection.

This is done in the following manner:

Set VF administrative MAC address :

$ ip link set <pf-netdev> vf <vf-index> mac <mac-address>

Unbind/Bind VF driver :

$ echo <vf-pci-address> > /sys/bus/pci/drivers/mlx5_core/unbind
$ echo <vf-pci-address> > /sys/bus/pci/drivers/mlx5_core/bind

Example:

$ ip link set enp4s0f0 vf 3 mac 02:03:00:00:48:56
$ echo 0000:03:00.5 > /sys/bus/pci/drivers/mlx5_core/unbind
$ echo 0000:03:00.5 > /sys/bus/pci/drivers/mlx5_core/bind

Doing so will populate the VF's node and pord GUID required for RDMA CM to establish connection.

About

RDMA CNI plugin for containerized workloads

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 81.5%
  • Makefile 12.5%
  • Shell 4.8%
  • Dockerfile 1.2%