5.0 KiB
5.0 KiB
GPU Test Report
- Date: 2026-05-22T15:49:02.368516
- Host: aikubeworker0016
- GPU: NVIDIA H100 80GB HBM3 x8
- Driver: 580.159.03 | CUDA: 13.0
Overall Acceptance Verdict
Result: FAIL
Failed or unverified items:
- Compute Throughput: FAIL (worst FP32 52 vs >= 54)
- NCCL: FAIL (no nccl-tests bus BW)
- RDMA: FAIL
- Training: UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict)
Missing required evidence:
- NVLink/NVSwitch
- DCGM
Summary
| Test | Result |
|---|---|
| GPU Info | PASS (8 GPUs detected) |
| Health Check | PASS |
| Memory Bandwidth | PASS (108.1%) |
| Compute Throughput | FAIL (worst FP32 52 vs >= 54) |
| NCCL | FAIL (no nccl-tests bus BW) |
| Stress Test | PASS |
| RDMA | FAIL |
| Training | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) |
GPU Information
| GPU | Model | VRAM | Temp | Power | SM Clock |
|---|---|---|---|---|---|
| 0 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 70/700W | 345 MHz |
| 1 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz |
| 2 | NVIDIA H100 80GB HBM3 | 81559 MB | 22C | 67/700W | 345 MHz |
| 3 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz |
| 4 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz |
| 5 | NVIDIA H100 80GB HBM3 | 81559 MB | 23C | 69/700W | 345 MHz |
| 6 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz |
| 7 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 66/700W | 345 MHz |
Health Check
Overall: PASS
| GPU | Temp | Power | ECC | PCIe | Throttle | Status |
|---|---|---|---|---|---|---|
| 0 | 21C PASS | 70W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 1 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 2 | 22C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 3 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 4 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 5 | 23C PASS | 69W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 6 | 21C PASS | 68W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
| 7 | 21C PASS | 66W PASS | S:0 D:0 | Gen5x16 | PASS | WARN |
Memory Bandwidth
Source: nvbandwidth
| Metric | Value | Peak | Efficiency |
|---|---|---|---|
| H2D (PCIe) | 55.5 GB/s | 64 GB/s | 86.7% |
| D2H (PCIe) | 55.3 GB/s | 64 GB/s | 86.4% |
| D2D (NVLink) | 486.5 GB/s | 450 GB/s | 108.1% |
Verdict: PASS (D2D efficiency 108.1%)
Compute Throughput
| DType | Achieved (TFLOPS) | Peak | Threshold | Status |
|---|---|---|---|---|
| FP32 | 51.9 | 67 | >= 54 | FAIL |
| TF32 | 357.0 | 495 | >= 444 | FAIL |
| FP16 | 664.0 | 990 | >= 734 | FAIL |
| BF16 | 700.1 | 990 | >= 745 | FAIL |
| FP8 | 1116.2 | 1979 | >= 1400 | FAIL |
Verdict: FAIL (absolute TFLOPS thresholds; worst efficiency 56.4%)
Compute Per-GPU TFLOPS
| GPU | FP32 | TF32 | FP16 | BF16 | FP8 |
|---|---|---|---|---|---|
| 0 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 1 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 2 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 3 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 4 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 5 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 6 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
| 7 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
NCCL Multi-GPU
Source: torchrun_fallback | GPUs: 8
Functional NCCL smoke only: nccl-tests bus bandwidth was not measured, so this does not satisfy production acceptance.
| Operation | Bus BW (GB/s) | Threshold | Status |
|---|---|---|---|
| NCCL version 2.21.5+cuda12.4 | 0.0 | >= 0 | FAIL |
| allreduce | 0.0 | >= 0 | PASS |
| broadcast | 0.0 | >= 0 | PASS |
| allgather | 0.0 | >= 0 | PASS |
| reducescatter | 0.0 | >= 0 | PASS |
| alltoall | 0.0 | >= 0 | PASS |
Overall: FAIL
Stress Test
- Source: pytorch
- Duration: 60s (requested 60s)
- Result: PASS
RDMA/InfiniBand
Legacy RDMA result re-evaluated with current PDF acceptance thresholds; old WARN statuses and old 50GB/s/10us limits are not used for verdict.
| Test | Value | Threshold | Status |
|---|---|---|---|
| ib_write_bw | 0.1 GB/s | >= 47 GB/s | FAIL |
| ib_read_bw | 0.1 GB/s | >= 47 GB/s | FAIL |
| ib_write_lat | 4.10 us | <= 2 us | FAIL |
| ib_read_lat | 16.00 us | <= 3.5 us | FAIL |
- Failure reasons:
- ib_write_bw bandwidth 0.13GB/s < 47GB/s
- ib_read_bw bandwidth 0.13GB/s < 47GB/s
- ib_write_lat latency 4.1us > 2us
- ib_read_lat latency 16.0us > 3.5us Overall: FAIL
Training Simulation
| Metric | Value |
|---|---|
| Model | synthetic_transformer |
| Params | 1470.5M |
| Throughput | 52471 tokens/sec |
| Avg Step Time | 312.3 ms |
| Peak Memory | 27.3 GB |
| Final Loss | 0.0041 |
| Step Jitter | N/A% |
| Distributed Mode | N/A |
| Acceptance Gaps | missing passed, step_jitter_pct, distributed_mode, loss_finite |
| Verdict | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) |
Generated by GPU Test Suite v0.2.0