test_gpu_scripts/reports_all_aikubeworker0016.md

5.0 KiB

GPU Test Report

  • Date: 2026-05-22T15:49:02.368516
  • Host: aikubeworker0016
  • GPU: NVIDIA H100 80GB HBM3 x8
  • Driver: 580.159.03 | CUDA: 13.0

Overall Acceptance Verdict

Result: FAIL

Failed or unverified items:

  • Compute Throughput: FAIL (worst FP32 52 vs >= 54)
  • NCCL: FAIL (no nccl-tests bus BW)
  • RDMA: FAIL
  • Training: UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict)

Missing required evidence:

  • NVLink/NVSwitch
  • DCGM

Summary

Test Result
GPU Info PASS (8 GPUs detected)
Health Check PASS
Memory Bandwidth PASS (108.1%)
Compute Throughput FAIL (worst FP32 52 vs >= 54)
NCCL FAIL (no nccl-tests bus BW)
Stress Test PASS
RDMA FAIL
Training UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict)

GPU Information

GPU Model VRAM Temp Power SM Clock
0 NVIDIA H100 80GB HBM3 81559 MB 21C 70/700W 345 MHz
1 NVIDIA H100 80GB HBM3 81559 MB 21C 68/700W 345 MHz
2 NVIDIA H100 80GB HBM3 81559 MB 22C 67/700W 345 MHz
3 NVIDIA H100 80GB HBM3 81559 MB 21C 67/700W 345 MHz
4 NVIDIA H100 80GB HBM3 81559 MB 21C 67/700W 345 MHz
5 NVIDIA H100 80GB HBM3 81559 MB 23C 69/700W 345 MHz
6 NVIDIA H100 80GB HBM3 81559 MB 21C 68/700W 345 MHz
7 NVIDIA H100 80GB HBM3 81559 MB 21C 66/700W 345 MHz

Health Check

Overall: PASS

GPU Temp Power ECC PCIe Throttle Status
0 21C PASS 70W PASS S:0 D:0 Gen5x16 PASS WARN
1 21C PASS 67W PASS S:0 D:0 Gen5x16 PASS WARN
2 22C PASS 67W PASS S:0 D:0 Gen5x16 PASS WARN
3 21C PASS 67W PASS S:0 D:0 Gen5x16 PASS WARN
4 21C PASS 67W PASS S:0 D:0 Gen5x16 PASS WARN
5 23C PASS 69W PASS S:0 D:0 Gen5x16 PASS WARN
6 21C PASS 68W PASS S:0 D:0 Gen5x16 PASS WARN
7 21C PASS 66W PASS S:0 D:0 Gen5x16 PASS WARN

Memory Bandwidth

Source: nvbandwidth

Metric Value Peak Efficiency
H2D (PCIe) 55.5 GB/s 64 GB/s 86.7%
D2H (PCIe) 55.3 GB/s 64 GB/s 86.4%
D2D (NVLink) 486.5 GB/s 450 GB/s 108.1%

Verdict: PASS (D2D efficiency 108.1%)

Compute Throughput

DType Achieved (TFLOPS) Peak Threshold Status
FP32 51.9 67 >= 54 FAIL
TF32 357.0 495 >= 444 FAIL
FP16 664.0 990 >= 734 FAIL
BF16 700.1 990 >= 745 FAIL
FP8 1116.2 1979 >= 1400 FAIL

Verdict: FAIL (absolute TFLOPS thresholds; worst efficiency 56.4%)

Compute Per-GPU TFLOPS

GPU FP32 TF32 FP16 BF16 FP8
0 51.9 357.0 664.0 700.1 1116.2
1 51.9 357.0 664.0 700.1 1116.2
2 51.9 357.0 664.0 700.1 1116.2
3 51.9 357.0 664.0 700.1 1116.2
4 51.9 357.0 664.0 700.1 1116.2
5 51.9 357.0 664.0 700.1 1116.2
6 51.9 357.0 664.0 700.1 1116.2
7 51.9 357.0 664.0 700.1 1116.2

NCCL Multi-GPU

Source: torchrun_fallback | GPUs: 8

Functional NCCL smoke only: nccl-tests bus bandwidth was not measured, so this does not satisfy production acceptance.

Operation Bus BW (GB/s) Threshold Status
NCCL version 2.21.5+cuda12.4 0.0 >= 0 FAIL
allreduce 0.0 >= 0 PASS
broadcast 0.0 >= 0 PASS
allgather 0.0 >= 0 PASS
reducescatter 0.0 >= 0 PASS
alltoall 0.0 >= 0 PASS

Overall: FAIL

Stress Test

  • Source: pytorch
  • Duration: 60s (requested 60s)
  • Result: PASS

RDMA/InfiniBand

Legacy RDMA result re-evaluated with current PDF acceptance thresholds; old WARN statuses and old 50GB/s/10us limits are not used for verdict.

Test Value Threshold Status
ib_write_bw 0.1 GB/s >= 47 GB/s FAIL
ib_read_bw 0.1 GB/s >= 47 GB/s FAIL
ib_write_lat 4.10 us <= 2 us FAIL
ib_read_lat 16.00 us <= 3.5 us FAIL
  • Failure reasons:
    • ib_write_bw bandwidth 0.13GB/s < 47GB/s
    • ib_read_bw bandwidth 0.13GB/s < 47GB/s
    • ib_write_lat latency 4.1us > 2us
    • ib_read_lat latency 16.0us > 3.5us Overall: FAIL

Training Simulation

Metric Value
Model synthetic_transformer
Params 1470.5M
Throughput 52471 tokens/sec
Avg Step Time 312.3 ms
Peak Memory 27.3 GB
Final Loss 0.0041
Step Jitter N/A%
Distributed Mode N/A
Acceptance Gaps missing passed, step_jitter_pct, distributed_mode, loss_finite
Verdict UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict)

Generated by GPU Test Suite v0.2.0