# GPU Test Report - **Date:** 2026-05-22T15:49:02.368516 - **Host:** aikubeworker0016 - **GPU:** NVIDIA H100 80GB HBM3 x8 - **Driver:** 580.159.03 | **CUDA:** 13.0 ## Overall Acceptance Verdict **Result: FAIL** Failed or unverified items: - Compute Throughput: FAIL (worst FP32 52 vs >= 54) - NCCL: FAIL (no nccl-tests bus BW) - RDMA: FAIL - Training: UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) Missing required evidence: - NVLink/NVSwitch - DCGM ## Summary | Test | Result | |------|--------| | GPU Info | PASS (8 GPUs detected) | | Health Check | PASS | | Memory Bandwidth | PASS (108.1%) | | Compute Throughput | FAIL (worst FP32 52 vs >= 54) | | NCCL | FAIL (no nccl-tests bus BW) | | Stress Test | PASS | | RDMA | FAIL | | Training | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) | ## GPU Information | GPU | Model | VRAM | Temp | Power | SM Clock | |-----|-------|------|------|-------|----------| | 0 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 70/700W | 345 MHz | | 1 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz | | 2 | NVIDIA H100 80GB HBM3 | 81559 MB | 22C | 67/700W | 345 MHz | | 3 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz | | 4 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz | | 5 | NVIDIA H100 80GB HBM3 | 81559 MB | 23C | 69/700W | 345 MHz | | 6 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz | | 7 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 66/700W | 345 MHz | ## Health Check **Overall: PASS** | GPU | Temp | Power | ECC | PCIe | Throttle | Status | |-----|------|-------|-----|------|----------|--------| | 0 | 21C PASS | 70W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 1 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 2 | 22C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 3 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 4 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 5 | 23C PASS | 69W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 6 | 21C PASS | 68W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | | 7 | 21C PASS | 66W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** | ## Memory Bandwidth Source: nvbandwidth | Metric | Value | Peak | Efficiency | |--------|-------|------|------------| | H2D (PCIe) | 55.5 GB/s | 64 GB/s | 86.7% | | D2H (PCIe) | 55.3 GB/s | 64 GB/s | 86.4% | | D2D (NVLink) | 486.5 GB/s | 450 GB/s | 108.1% | **Verdict: PASS** (D2D efficiency 108.1%) ## Compute Throughput | DType | Achieved (TFLOPS) | Peak | Threshold | Status | |-------|-------------------|------|------------|--------| | FP32 | 51.9 | 67 | >= 54 | FAIL | | TF32 | 357.0 | 495 | >= 444 | FAIL | | FP16 | 664.0 | 990 | >= 734 | FAIL | | BF16 | 700.1 | 990 | >= 745 | FAIL | | FP8 | 1116.2 | 1979 | >= 1400 | FAIL | **Verdict: FAIL** (absolute TFLOPS thresholds; worst efficiency 56.4%) ### Compute Per-GPU TFLOPS | GPU | FP32 | TF32 | FP16 | BF16 | FP8 | |---|---|---|---|---|---| | 0 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 1 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 2 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 3 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 4 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 5 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 6 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | | 7 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 | ## NCCL Multi-GPU Source: torchrun_fallback | GPUs: 8 > Functional NCCL smoke only: nccl-tests bus bandwidth was not measured, so this does not satisfy production acceptance. | Operation | Bus BW (GB/s) | Threshold | Status | |-----------|---------------|-----------|--------| | NCCL version 2.21.5+cuda12.4 | 0.0 | >= 0 | FAIL | | allreduce | 0.0 | >= 0 | PASS | | broadcast | 0.0 | >= 0 | PASS | | allgather | 0.0 | >= 0 | PASS | | reducescatter | 0.0 | >= 0 | PASS | | alltoall | 0.0 | >= 0 | PASS | **Overall: FAIL** ## Stress Test - **Source:** pytorch - **Duration:** 60s (requested 60s) - **Result: PASS** ## RDMA/InfiniBand > Legacy RDMA result re-evaluated with current PDF acceptance thresholds; old WARN statuses and old 50GB/s/10us limits are not used for verdict. | Test | Value | Threshold | Status | |------|-------|-----------|--------| | ib_write_bw | 0.1 GB/s | >= 47 GB/s | FAIL | | ib_read_bw | 0.1 GB/s | >= 47 GB/s | FAIL | | ib_write_lat | 4.10 us | <= 2 us | FAIL | | ib_read_lat | 16.00 us | <= 3.5 us | FAIL | - **Failure reasons:** - ib_write_bw bandwidth 0.13GB/s < 47GB/s - ib_read_bw bandwidth 0.13GB/s < 47GB/s - ib_write_lat latency 4.1us > 2us - ib_read_lat latency 16.0us > 3.5us **Overall: FAIL** ## Training Simulation | Metric | Value | |--------|-------| | Model | synthetic_transformer | | Params | 1470.5M | | Throughput | 52471 tokens/sec | | Avg Step Time | 312.3 ms | | Peak Memory | 27.3 GB | | Final Loss | 0.0041 | | Step Jitter | N/A% | | Distributed Mode | N/A | | Acceptance Gaps | missing passed, step_jitter_pct, distributed_mode, loss_finite | | Verdict | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) | --- *Generated by GPU Test Suite v0.2.0*