[TECH PREVIEW] Configuration Assistance with NVIDIA Kubernetes Launch Kit
On this page
NVIDIA Kubernetes Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies.
Prerequisites
For prerequisites, please refer to the NVIDIA Network Operator Deployment Guide with Kubernetes page.
You will need a Kubernetes cluster with NVIDIA Network Operator helm chart installed.
Operation Phases
Discover Cluster Configuration
Deploy a minimal Network Operator profile to automatically discover your cluster’s network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file.
Select the Deployment Profile
Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM.
Generate Deployment Files
Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile.
Deploy to Cluster
Apply the generated deployment files to your Kubernetes cluster. This phase uses the --deploy flag and requires --kubeconfig to be specified. This phase is optional and can be skipped if --deploy is not provided.
Supported Deployment Profiles
Kubernetes Launch Kit supports the following deployment profiles:
Profile |
Fabric |
Deployment Type |
Notes |
|---|---|---|---|
SR-IOV Ethernet RDMA |
ethernet |
sriov |
High-performance networking with hardware acceleration and dedicated VF resources. |
SR-IOV InfiniBand RDMA |
infiniband |
sriov |
Virtualized InfiniBand with hardware acceleration and isolated IB partitions. |
IP over InfiniBand with RDMA Shared Device |
infiniband |
rdma_shared |
InfiniBand networking with shared RDMA resources for parallel I/O workloads. |
Host Device RDMA |
any |
host_device |
Direct hardware access for legacy applications requiring exclusive device control. |
MacVLAN with RDMA Shared Device |
ethernet |
rdma_shared |
Network isolation with shared RDMA capabilities for multi-tenant environments. |
Spectrum-X |
ethernet |
sriov |
Spectrum-X networking with |
Please refer to the Quick Start Guide for Kubernetes page for more details.
For Spectrum-X configuration, refer to the [TECH PREVIEW] NVIDIA Spectrum-X NIC Configuration page.
Heterogeneous Cluster Support
During cluster discovery, nodes are automatically grouped by their NIC configuration. Nodes with identical PCI addresses and device IDs (the same PF fingerprint) are placed in the same group. Each group receives:
A unique
identifier(empty string for single-group clusters,group-0,group-1, etc. for multi-group)An auto-computed
nodeSelectorbased on labels that distinguish the group from othersmachineTypeandproductTypeextracted fromnvidia.com/gpu.machineandnvidia.com/gpu.productnode labels
Templates that reference cluster configuration are rendered once per group, producing separate output files per group (e.g., 30-sriovnetworknodepolicy-group-0.yaml, 30-sriovnetworknodepolicy-group-1.yaml).
Use the --group flag to generate manifests for a single group:
l8k --user-config ./config.yaml \
--fabric infiniband --deployment-type sriov --multirail \
--group group-0 \
--save-deployment-files ./deployments
North-South Traffic Detection
During cluster discovery, the tool automatically identifies BlueField DPU devices by matching each device’s part number against a known list of DPU product codes. Devices matching a DPU product code are classified as north-south traffic (management/external), while all other devices (SuperNICs, ConnectX NICs) are classified as east-west traffic (GPU interconnect).
North-south PFs are included in the saved cluster configuration for visibility, but are automatically filtered out during template rendering so that only east-west PFs appear in the generated deployment manifests.
Each east-west PF is assigned a sequential rail number (rail: 0, rail: 1, rail: 2, …) used for naming resources such as SriovNetworkNodePolicy and IPPool entries.
Usage
Kubernetes Launch Kit is available as a docker container:
mkdir ~/cluster-configuration
cp /etc/kubernetes/admin.conf ~/cluster-configuration/kubeconfig
docker run -v ~/cluster-configuration:/cluster-configuration \
--net=host \
nvcr.io/nvidia/cloud-native/k8s-launch-kit:v26.1.0 \
--discover-cluster-config \
--kubeconfig /cluster-configuration/kubeconfig \
--save-cluster-config /cluster-configuration/config.yaml \
--log-level debug \
--save-deployment-files /cluster-configuration/deployments \
--fabric infiniband --deployment-type rdma_shared --multirail
Note
You must enable --net=host and mount the necessary directories for input and output files with -v.
CLI Reference
General Flags
Flag |
Description |
|---|---|
|
Comma-separated list of plugins to enable (default: |
|
Log level: debug, info, warn, error |
|
Write logs to file instead of stderr |
|
Show help |
Cluster Discovery Flags
Flag |
Description |
|---|---|
|
Deploy a thin Network Operator profile to discover cluster capabilities |
|
Save discovered cluster configuration to the specified path (defaults to |
|
Use provided cluster configuration file (as base config for discovery or as full config without discovery) |
|
Path to kubeconfig file for cluster operations |
|
Filter nodes for discovery by label (default: |
|
Override the network operator namespace from the config file |
Profile Selection Flags
Flag |
Description |
|---|---|
|
Select the fabric type to deploy (infiniband, ethernet) |
|
Select the deployment type (sriov, rdma_shared, host_device) |
|
Enable multirail deployment |
|
Enable Spectrum-X deployment |
|
Enable AI deployment |
|
Generate templates for a specific node group only (e.g., |
Note
If the configuration file provided via --user-config already contains a complete profile: section, the --fabric, --deployment-type, and other profile selection flags are not required. CLI flags override values from the configuration file when both are provided.
Spectrum-X Flags
Flag |
Description |
|---|---|
|
Spectrum-X version (default: |
|
Multiplane mode: |
|
Number of planes for multiplane deployment (required when mode is not |
Note
When --spectrum-x is specified, --fabric, --deployment-type, and --multirail are automatically set to ethernet, sriov, and true respectively. The --spcx-version defaults to RA2.1.
LLM Flags
Flag |
Description |
|---|---|
|
Path to file with a prompt to use for LLM-assisted profile generation |
|
Enable interactive chat mode with the LLM agent |
|
API key for the LLM API |
|
API URL for the LLM API |
|
Vendor of the LLM API (openai, openai-azure, anthropic, gemini). Default: |
|
Custom model name for the LLM API |
Deployment Flags
Flag |
Description |
|---|---|
|
Save generated deployment files to the specified directory (default: |
|
Deploy the generated files to the Kubernetes cluster |
Usage Examples
Complete Workflow
Discover cluster config, generate files, and deploy:
l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deployments \
--deploy --kubeconfig ~/.kube/config
Discover Cluster Configuration
l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \
--kubeconfig ~/.kube/config
Filter discovery to specific nodes using a label selector:
l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \
--label-selector "feature.node.kubernetes.io/pci-15b3.present=true" \
--kubeconfig ~/.kube/config
Discovery with User-Provided Base Config
Use your own config file (with custom network operator version, subnets, etc.) as the base for discovery. Without --save-cluster-config, the file is rewritten in place with discovery results:
l8k --user-config ./my-config.yaml --discover-cluster-config \
--kubeconfig ~/.kube/config
Save discovery results to a separate file instead:
l8k --user-config ./my-config.yaml --discover-cluster-config \
--save-cluster-config ./discovered-config.yaml \
--kubeconfig ~/.kube/config
Use Existing Configuration
Generate and deploy with pre-existing config:
l8k --user-config ./existing-config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--deploy --kubeconfig ~/.kube/config
Generate Deployment Files
l8k --user-config ./config.yaml \
--fabric ethernet --deployment-type sriov --multirail \
--save-deployment-files ./deployments
Generate Deployment Files for a Specific Node Group
In heterogeneous clusters, discovery produces multiple node groups. Use --group to generate manifests for a single group:
l8k --user-config ./config.yaml \
--fabric infiniband --deployment-type sriov --multirail \
--group group-0 \
--save-deployment-files ./deployments
Spectrum-X Deployment
The --spectrum-x flag automatically sets --fabric to ethernet, --deployment-type to sriov, and --multirail to true. You must specify --multiplane-mode to select the multiplane mode, and --number-of-planes when the mode is not none. The --spcx-version defaults to RA2.1.
For more information on Spectrum-X configuration, refer to the [TECH PREVIEW] NVIDIA Spectrum-X NIC Configuration page.
Note
NIC type constraints for multiplane modes:
ConnectX-8 (device ID 1023): supports
swplb,hwplb, anduniplanemodesBlueField-3 SuperNIC (device ID a2dc): only supports
nonemode
HWPLB Mode
Hardware Plane Load Balancing for larger-scale clusters with 2-tier or 3-tier switch topologies.
l8k --user-config ./config.yaml --spectrum-x \
--multiplane-mode hwplb --number-of-planes 4 \
--save-deployment-files ./deployments
SWPLB Mode
Software Plane Load Balancing generates separate resources per-rail per-plane. Suitable for smaller-scale clusters.
l8k --user-config ./config.yaml --spectrum-x \
--multiplane-mode swplb --number-of-planes 2 \
--save-deployment-files ./deployments
Uniplane Mode
Unified plane mode with no plane separation. Simplest topology for ConnectX-8. Forces --number-of-planes to 1.
l8k --user-config ./config.yaml --spectrum-x \
--multiplane-mode uniplane \
--save-deployment-files ./deployments
Single Plane (None)
No multiplane separation. Use with BlueField-3 SuperNIC or simple topologies.
l8k --user-config ./config.yaml --spectrum-x \
--multiplane-mode none \
--save-deployment-files ./deployments
LLM-Assisted Profile Selection
Kubernetes Launch Kit supports LLM-assisted profile generation. An AI agent analyzes your cluster configuration and requirements to recommend the optimal deployment profile.
Supported LLM vendors:
openai— OpenAI APIopenai-azure— Azure OpenAI Serviceanthropic— Anthropic APIgemini— Google Gemini API
Non-Interactive Mode
Provide a prompt file with --prompt for single-shot profile generation.
echo "I want to enable multirail networking in my AI cluster" > requirements.txt
l8k --user-config ./config.yaml \
--prompt requirements.txt --llm-vendor openai-azure \
--llm-api-key <OPENAI_API_KEY> --llm-api-url <OPENAI_API_URL> \
--llm-model <OPENAI_MODEL> \
--save-deployment-files ./deployments
Using Anthropic:
l8k --user-config ./config.yaml \
--prompt requirements.txt --llm-vendor anthropic \
--llm-api-key <ANTHROPIC_API_KEY> --llm-api-url <ANTHROPIC_API_URL> \
--llm-model <ANTHROPIC_MODEL> \
--save-deployment-files ./deployments
Using Gemini:
l8k --user-config ./config.yaml \
--prompt requirements.txt --llm-vendor gemini \
--llm-api-key <GEMINI_API_KEY> --llm-api-url <GEMINI_API_URL> \
--llm-model <GEMINI_MODEL> \
--save-deployment-files ./deployments
Interactive Chat Mode
Use --llm-interactive for a back-and-forth conversation with the AI agent. The agent will ask clarifying questions and help you select the optimal profile. Type generate to confirm and generate manifests.
Note
--prompt and --llm-interactive cannot be used together.
l8k --user-config ./config.yaml \
--llm-interactive --llm-vendor anthropic \
--llm-api-key <KEY> --llm-api-url <API_URL> \
--llm-model <LLM_MODEL> \
--save-deployment-files ./deployments
HTML Overview Generation
An overview.html file is generated alongside the YAML deployment files. It contains the profile description, notes, collapsible file contents, and a link to the deployment guide. The path to the generated HTML file is automatically printed to the console.
Configuration File Format
After the cluster configuration is discovered, the tool will save the configuration to a file.
You can use this file as a starting point for your own configuration. A custom configuration file can be provided to the tool using the --user-config flag — either as a standalone config (skipping discovery) or as a base config combined with --discover-cluster-config (discovery takes network operator parameters from the file and adds discovered cluster config).
networkOperator:
version: v26.1.0
componentVersion: network-operator-v26.1.0
repository: nvcr.io/nvidia/cloud-native
namespace: nvidia-network-operator
docsBaseURL: https://docs.nvidia.com/networking/display/kubernetes2610
docaDriver:
version: doca3.3.0-26.01-1.0.0.0-0
unloadStorageModules: false
enableNFSRDMA: false
nvIpam:
poolName: nv-ipam-pool
subnets:
- subnet: 192.168.2.0/24
gateway: 192.168.2.1
- subnet: 192.168.3.0/24
gateway: 192.168.3.1
- subnet: 192.168.4.0/24
gateway: 192.168.4.1
- subnet: 192.168.5.0/24
gateway: 192.168.5.1
sriov:
ethernetMtu: 9000
infinibandMtu: 4000
numVfs: 8
priority: 90
resourceName: sriov_resource
networkName: sriov-network
hostdev:
resourceName: hostdev-resource
networkName: hostdev-network
rdmaShared:
resourceName: rdma_shared_resource
hcaMax: 63
ipoib:
networkName: ipoib-network
macvlan:
networkName: macvlan-network
spectrumX:
nicType: "1023" # "1023" for ConnectX-8, "a2dc" for BlueField-3 SuperNIC
overlay: "none"
rdmaPrefix: "roce_p%plane%_r%rail%"
netdevPrefix: "eth_p%plane%_r%rail%"
profile:
fabric: ethernet # infiniband, ethernet
deployment: sriov # rdma_shared, sriov, host_device
multirail: false
spectrumX: # Spectrum-X configuration (set to null or omit if not using Spectrum-X)
spcxVersion: "RA2.1" # CLI override: --spcx-version
multiplaneMode: swplb # CLI override: --multiplane-mode (swplb, hwplb, uniplane, none)
numberOfPlanes: 4 # CLI override: --number-of-planes
ai: false
clusterConfig:
- identifier: "group-0"
machineType: "DGX-B200"
productType: "NVIDIA-B200"
labelSelector:
feature.node.kubernetes.io/pci-15b3.present: "true"
capabilities:
nodes:
sriov: true
rdma: true
ib: false
workerNodes: ["worker-0", "worker-1"]
nodeSelector:
nvidia.com/gpu.machine: "DGX-B200"
pfs:
- deviceID: "1023"
pciAddress: "0000:05:00.0"
rdmaDevice: "mlx5_0"
networkInterface: "net1"
traffic: east-west
rail: 0
- deviceID: "1023"
pciAddress: "0000:75:00.0"
rdmaDevice: "mlx5_1"
networkInterface: "net2"
traffic: east-west
rail: 1
- deviceID: "1023"
pciAddress: "0000:6a:00.0"
rdmaDevice: "mlx5_4"
networkInterface: "net5"
traffic: north-south
- identifier: "group-1"
machineType: "PowerEdge-XE9680"
productType: "NVIDIA-H100"
labelSelector:
feature.node.kubernetes.io/pci-15b3.present: "true"
capabilities:
nodes:
sriov: true
rdma: true
ib: false
workerNodes: ["worker-2", "worker-3"]
nodeSelector:
nvidia.com/gpu.machine: "PowerEdge-XE9680"
pfs:
- deviceID: "a2dc"
pciAddress: "0000:1a:00.0"
rdmaDevice: ""
networkInterface: ""
traffic: east-west
rail: 0
- deviceID: "a2dc"
pciAddress: "0000:3c:00.0"
rdmaDevice: ""
networkInterface: ""
traffic: east-west
rail: 1
Note
The clusterConfig section is an array of node groups. In heterogeneous clusters, each group contains nodes with identical NIC configurations (same PCI addresses and device IDs). Each PF entry includes a traffic field (east-west or north-south) and a rail field (sequential index for east-west PFs). North-south PFs do not have a rail field. See Heterogeneous Cluster Support for details.