2.9 KiB
2.9 KiB
Single-node RDMA/IB Report
Generated: 2026-05-22 23:41 Asia/Shanghai
Scope: project CLI gpu_tester.py --test rdma --report --format json, run separately on each host.
Important note: the current repository RDMA test is single-node only. In modules/rdma_test.py, the perftest client connects to localhost, so this report validates local IB device discovery and local perftest behavior. It does not validate cross-node RDMA bandwidth between aikubeworker0012 and aikubeworker0016.
Summary
| Host | Devices Found | Active 400G Ports | Active 100G Ports | Down Ports | Overall |
|---|---|---|---|---|---|
| aikubeworker0012 / 172.72.8.12 | 10 | mlx5_0, mlx5_1, mlx5_6, mlx5_7 | mlx5_4, mlx5_5 | mlx5_3, mlx5_9 | WARN |
| aikubeworker0016 / 172.72.8.16 | 10 | mlx5_0, mlx5_1, mlx5_6, mlx5_7 | mlx5_4, mlx5_5 | mlx5_3, mlx5_9 | WARN |
Bandwidth
The bandwidth numbers below are from the repo's local localhost RDMA perftest path.
| Host | ib_write_bw | Threshold | Status | ib_read_bw | Threshold | Status |
|---|---|---|---|---|---|---|
| aikubeworker0012 | 0.13 GB/s | 50 GB/s | WARN | 0.13 GB/s | 50 GB/s | WARN |
| aikubeworker0016 | 0.13 GB/s | 50 GB/s | WARN | 0.13 GB/s | 50 GB/s | WARN |
Latency
| Host | ib_write_lat | Limit | Status | ib_read_lat | Limit | Status |
|---|---|---|---|---|---|---|
| aikubeworker0012 | 4.53 us | 10 us | PASS | 16.00 us | 10 us | WARN |
| aikubeworker0016 | 4.22 us | 10 us | PASS | 16.00 us | 10 us | WARN |
Device Inventory
aikubeworker0012
| Device | Port | State | Physical State | Rate |
|---|---|---|---|---|
| mlx5_0 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_1 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_2 | 1 | ACTIVE | LinkUp | 25 Gb/sec (1X EDR) |
| mlx5_3 | 1 | DOWN | Disabled | 25 Gb/sec (1X EDR) |
| mlx5_4 | 1 | ACTIVE | LinkUp | 100 Gb/sec (2X HDR) |
| mlx5_5 | 1 | ACTIVE | LinkUp | 100 Gb/sec (2X HDR) |
| mlx5_6 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_7 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_8 | 1 | ACTIVE | LinkUp | 25 Gb/sec (1X EDR) |
| mlx5_9 | 1 | DOWN | Disabled | 25 Gb/sec (1X EDR) |
aikubeworker0016
| Device | Port | State | Physical State | Rate |
|---|---|---|---|---|
| mlx5_0 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_1 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_2 | 1 | ACTIVE | LinkUp | 25 Gb/sec (1X EDR) |
| mlx5_3 | 1 | DOWN | Disabled | 25 Gb/sec (1X EDR) |
| mlx5_4 | 1 | ACTIVE | LinkUp | 100 Gb/sec (2X HDR) |
| mlx5_5 | 1 | ACTIVE | LinkUp | 100 Gb/sec (2X HDR) |
| mlx5_6 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_7 | 1 | ACTIVE | LinkUp | 400 Gb/sec (4X NDR) |
| mlx5_8 | 1 | ACTIVE | LinkUp | 25 Gb/sec (1X EDR) |
| mlx5_9 | 1 | DOWN | Disabled | 25 Gb/sec (1X EDR) |
Files
Raw JSON:
reports_rdma_aikubeworker0012.jsonreports_rdma_aikubeworker0016.json
Markdown summary:
reports_rdma_single_node_summary.md