GPU Test Report
- Date: 2026-05-23T07:54:48.990378
- Host: aikubeworker0012
Overall Acceptance Verdict
Result: FAIL
Missing required evidence:
- GPU Info
- Health Check
- Memory Bandwidth
- Compute Throughput
- NVLink/NVSwitch
- NCCL
- Stress Test
- RDMA
- DCGM
- Training
Summary
| Test |
Result |
| Multi-node NCCL |
FAIL |
Multi-node NCCL / Cross Leaf
Source: nccl-tests-mpirun | Mode: sweep-nccl-2.27.7
- Hosts: nccl-gpu-1(172.72.8.12), nccl-gpu-2(172.72.8.16)
- Preflight: PASS
Multi-node NCCL allreduce
| Topology |
Peak Bus BW |
Peak Size |
Avg Bus BW |
Threshold |
Status |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
237.26 GB/s |
4G |
150.62 GB/s |
>= 480 GB/s |
FAIL |
| Topology |
NCCL Network |
GPU Direct RDMA |
GDR Enabled HCAs |
GDR Disabled HCAs |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
IB |
ENABLED |
mlx5_0, mlx5_1, mlx5_6, mlx5_7 |
- |
| Topology |
Return Code |
Error / Output Tail |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
0 |
aikubeworker0012:2145024:2145189 [0] NCCL INFO comm 0x561f7dc1f780 rank 0 nranks 16 cudaDev 0 busId 18000 - Destroy COMPLETE # Out of bounds values : 0 OK # Avg bus bandwidth : 150.624 # # Collective test concluded: all_reduce_perf # |
Multi-node NCCL alltoall
| Topology |
Peak Bus BW |
Peak Size |
Avg Bus BW |
Threshold |
Status |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
28.78 GB/s |
1G |
23.57 GB/s |
>= 75 GB/s |
FAIL |
| Topology |
NCCL Network |
GPU Direct RDMA |
GDR Enabled HCAs |
GDR Disabled HCAs |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
IB |
ENABLED |
mlx5_0, mlx5_1, mlx5_6, mlx5_7 |
- |
| Topology |
Return Code |
Error / Output Tail |
| 2 nodes x 8 GPUs NCCL 2.27.7 sweep |
0 |
r0012:2145213:2145384 [7] NCCL INFO comm 0x558d54228110 rank 7 nranks 16 cudaDev 7 busId db000 - Destroy COMPLETE aikubeworker0016:1014703:1015544 [0] NCCL INFO comm 0x55ed6d99d8e0 rank 8 nranks 16 cudaDev 0 busId 18000 - Destroy COMPLETE |
Overall: FAIL
Generated by GPU Test Suite v0.2.0