test_gpu_scripts/reports_rdma_single_node_summary.md

2.9 KiB

Single-node RDMA/IB Report

Generated: 2026-05-22 23:41 Asia/Shanghai

Scope: project CLI gpu_tester.py --test rdma --report --format json, run separately on each host.

Important note: the current repository RDMA test is single-node only. In modules/rdma_test.py, the perftest client connects to localhost, so this report validates local IB device discovery and local perftest behavior. It does not validate cross-node RDMA bandwidth between aikubeworker0012 and aikubeworker0016.

Summary

Host Devices Found Active 400G Ports Active 100G Ports Down Ports Overall
aikubeworker0012 / 172.72.8.12 10 mlx5_0, mlx5_1, mlx5_6, mlx5_7 mlx5_4, mlx5_5 mlx5_3, mlx5_9 WARN
aikubeworker0016 / 172.72.8.16 10 mlx5_0, mlx5_1, mlx5_6, mlx5_7 mlx5_4, mlx5_5 mlx5_3, mlx5_9 WARN

Bandwidth

The bandwidth numbers below are from the repo's local localhost RDMA perftest path.

Host ib_write_bw Threshold Status ib_read_bw Threshold Status
aikubeworker0012 0.13 GB/s 50 GB/s WARN 0.13 GB/s 50 GB/s WARN
aikubeworker0016 0.13 GB/s 50 GB/s WARN 0.13 GB/s 50 GB/s WARN

Latency

Host ib_write_lat Limit Status ib_read_lat Limit Status
aikubeworker0012 4.53 us 10 us PASS 16.00 us 10 us WARN
aikubeworker0016 4.22 us 10 us PASS 16.00 us 10 us WARN

Device Inventory

aikubeworker0012

Device Port State Physical State Rate
mlx5_0 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_1 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_2 1 ACTIVE LinkUp 25 Gb/sec (1X EDR)
mlx5_3 1 DOWN Disabled 25 Gb/sec (1X EDR)
mlx5_4 1 ACTIVE LinkUp 100 Gb/sec (2X HDR)
mlx5_5 1 ACTIVE LinkUp 100 Gb/sec (2X HDR)
mlx5_6 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_7 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_8 1 ACTIVE LinkUp 25 Gb/sec (1X EDR)
mlx5_9 1 DOWN Disabled 25 Gb/sec (1X EDR)

aikubeworker0016

Device Port State Physical State Rate
mlx5_0 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_1 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_2 1 ACTIVE LinkUp 25 Gb/sec (1X EDR)
mlx5_3 1 DOWN Disabled 25 Gb/sec (1X EDR)
mlx5_4 1 ACTIVE LinkUp 100 Gb/sec (2X HDR)
mlx5_5 1 ACTIVE LinkUp 100 Gb/sec (2X HDR)
mlx5_6 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_7 1 ACTIVE LinkUp 400 Gb/sec (4X NDR)
mlx5_8 1 ACTIVE LinkUp 25 Gb/sec (1X EDR)
mlx5_9 1 DOWN Disabled 25 Gb/sec (1X EDR)

Files

Raw JSON:

  • reports_rdma_aikubeworker0012.json
  • reports_rdma_aikubeworker0016.json

Markdown summary:

  • reports_rdma_single_node_summary.md