# GPU Test Report - **Date:** 2026-05-23T07:54:48.990378 - **Host:** aikubeworker0012 ## Overall Acceptance Verdict **Result: FAIL** Missing required evidence: - GPU Info - Health Check - Memory Bandwidth - Compute Throughput - NVLink/NVSwitch - NCCL - Stress Test - RDMA - DCGM - Training ## Summary | Test | Result | |------|--------| | Multi-node NCCL | FAIL | ## Multi-node NCCL / Cross Leaf Source: nccl-tests-mpirun | Mode: sweep-nccl-2.27.7 - **Hosts:** nccl-gpu-1(172.72.8.12), nccl-gpu-2(172.72.8.16) - **Preflight:** PASS ### Multi-node NCCL allreduce | Topology | Peak Bus BW | Peak Size | Avg Bus BW | Threshold | Status | |----------|-------------|-----------|------------|-----------|--------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | 237.26 GB/s | 4G | 150.62 GB/s | >= 480 GB/s | FAIL | | Topology | NCCL Network | GPU Direct RDMA | GDR Enabled HCAs | GDR Disabled HCAs | |----------|--------------|-----------------|------------------|-------------------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | IB | ENABLED | mlx5_0, mlx5_1, mlx5_6, mlx5_7 | - | | Topology | Return Code | Error / Output Tail | |----------|-------------|---------------------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | 0 | aikubeworker0012:2145024:2145189 [0] NCCL INFO comm 0x561f7dc1f780 rank 0 nranks 16 cudaDev 0 busId 18000 - Destroy COMPLETE # Out of bounds values : 0 OK # Avg bus bandwidth : 150.624 # # Collective test concluded: all_reduce_perf # | ### Multi-node NCCL alltoall | Topology | Peak Bus BW | Peak Size | Avg Bus BW | Threshold | Status | |----------|-------------|-----------|------------|-----------|--------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | 28.78 GB/s | 1G | 23.57 GB/s | >= 75 GB/s | FAIL | | Topology | NCCL Network | GPU Direct RDMA | GDR Enabled HCAs | GDR Disabled HCAs | |----------|--------------|-----------------|------------------|-------------------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | IB | ENABLED | mlx5_0, mlx5_1, mlx5_6, mlx5_7 | - | | Topology | Return Code | Error / Output Tail | |----------|-------------|---------------------| | 2 nodes x 8 GPUs NCCL 2.27.7 sweep | 0 | r0012:2145213:2145384 [7] NCCL INFO comm 0x558d54228110 rank 7 nranks 16 cudaDev 7 busId db000 - Destroy COMPLETE aikubeworker0016:1014703:1015544 [0] NCCL INFO comm 0x55ed6d99d8e0 rank 8 nranks 16 cudaDev 0 busId 18000 - Destroy COMPLETE | **Overall: FAIL** --- *Generated by GPU Test Suite v0.2.0*