157 lines
5.0 KiB
Markdown
157 lines
5.0 KiB
Markdown
# GPU Test Report
|
|
|
|
- **Date:** 2026-05-22T15:49:02.368516
|
|
- **Host:** aikubeworker0016
|
|
- **GPU:** NVIDIA H100 80GB HBM3 x8
|
|
- **Driver:** 580.159.03 | **CUDA:** 13.0
|
|
|
|
## Overall Acceptance Verdict
|
|
|
|
**Result: FAIL**
|
|
|
|
Failed or unverified items:
|
|
- Compute Throughput: FAIL (worst FP32 52 vs >= 54)
|
|
- NCCL: FAIL (no nccl-tests bus BW)
|
|
- RDMA: FAIL
|
|
- Training: UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict)
|
|
|
|
Missing required evidence:
|
|
- NVLink/NVSwitch
|
|
- DCGM
|
|
|
|
## Summary
|
|
|
|
| Test | Result |
|
|
|------|--------|
|
|
| GPU Info | PASS (8 GPUs detected) |
|
|
| Health Check | PASS |
|
|
| Memory Bandwidth | PASS (108.1%) |
|
|
| Compute Throughput | FAIL (worst FP32 52 vs >= 54) |
|
|
| NCCL | FAIL (no nccl-tests bus BW) |
|
|
| Stress Test | PASS |
|
|
| RDMA | FAIL |
|
|
| Training | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) |
|
|
|
|
## GPU Information
|
|
|
|
| GPU | Model | VRAM | Temp | Power | SM Clock |
|
|
|-----|-------|------|------|-------|----------|
|
|
| 0 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 70/700W | 345 MHz |
|
|
| 1 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz |
|
|
| 2 | NVIDIA H100 80GB HBM3 | 81559 MB | 22C | 67/700W | 345 MHz |
|
|
| 3 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz |
|
|
| 4 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 67/700W | 345 MHz |
|
|
| 5 | NVIDIA H100 80GB HBM3 | 81559 MB | 23C | 69/700W | 345 MHz |
|
|
| 6 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 68/700W | 345 MHz |
|
|
| 7 | NVIDIA H100 80GB HBM3 | 81559 MB | 21C | 66/700W | 345 MHz |
|
|
|
|
## Health Check
|
|
|
|
**Overall: PASS**
|
|
|
|
| GPU | Temp | Power | ECC | PCIe | Throttle | Status |
|
|
|-----|------|-------|-----|------|----------|--------|
|
|
| 0 | 21C PASS | 70W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 1 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 2 | 22C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 3 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 4 | 21C PASS | 67W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 5 | 23C PASS | 69W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 6 | 21C PASS | 68W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
| 7 | 21C PASS | 66W PASS | S:0 D:0 | Gen5x16 | PASS | **WARN** |
|
|
|
|
## Memory Bandwidth
|
|
|
|
Source: nvbandwidth
|
|
|
|
| Metric | Value | Peak | Efficiency |
|
|
|--------|-------|------|------------|
|
|
| H2D (PCIe) | 55.5 GB/s | 64 GB/s | 86.7% |
|
|
| D2H (PCIe) | 55.3 GB/s | 64 GB/s | 86.4% |
|
|
| D2D (NVLink) | 486.5 GB/s | 450 GB/s | 108.1% |
|
|
|
|
**Verdict: PASS** (D2D efficiency 108.1%)
|
|
|
|
## Compute Throughput
|
|
|
|
| DType | Achieved (TFLOPS) | Peak | Threshold | Status |
|
|
|-------|-------------------|------|------------|--------|
|
|
| FP32 | 51.9 | 67 | >= 54 | FAIL |
|
|
| TF32 | 357.0 | 495 | >= 444 | FAIL |
|
|
| FP16 | 664.0 | 990 | >= 734 | FAIL |
|
|
| BF16 | 700.1 | 990 | >= 745 | FAIL |
|
|
| FP8 | 1116.2 | 1979 | >= 1400 | FAIL |
|
|
|
|
**Verdict: FAIL** (absolute TFLOPS thresholds; worst efficiency 56.4%)
|
|
|
|
### Compute Per-GPU TFLOPS
|
|
|
|
| GPU | FP32 | TF32 | FP16 | BF16 | FP8 |
|
|
|---|---|---|---|---|---|
|
|
| 0 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 1 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 2 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 3 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 4 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 5 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 6 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
| 7 | 51.9 | 357.0 | 664.0 | 700.1 | 1116.2 |
|
|
|
|
## NCCL Multi-GPU
|
|
|
|
Source: torchrun_fallback | GPUs: 8
|
|
|
|
> Functional NCCL smoke only: nccl-tests bus bandwidth was not measured, so this does not satisfy production acceptance.
|
|
|
|
| Operation | Bus BW (GB/s) | Threshold | Status |
|
|
|-----------|---------------|-----------|--------|
|
|
| NCCL version 2.21.5+cuda12.4 | 0.0 | >= 0 | FAIL |
|
|
| allreduce | 0.0 | >= 0 | PASS |
|
|
| broadcast | 0.0 | >= 0 | PASS |
|
|
| allgather | 0.0 | >= 0 | PASS |
|
|
| reducescatter | 0.0 | >= 0 | PASS |
|
|
| alltoall | 0.0 | >= 0 | PASS |
|
|
|
|
**Overall: FAIL**
|
|
|
|
## Stress Test
|
|
|
|
- **Source:** pytorch
|
|
- **Duration:** 60s (requested 60s)
|
|
- **Result: PASS**
|
|
|
|
## RDMA/InfiniBand
|
|
|
|
> Legacy RDMA result re-evaluated with current PDF acceptance thresholds; old WARN statuses and old 50GB/s/10us limits are not used for verdict.
|
|
|
|
| Test | Value | Threshold | Status |
|
|
|------|-------|-----------|--------|
|
|
| ib_write_bw | 0.1 GB/s | >= 47 GB/s | FAIL |
|
|
| ib_read_bw | 0.1 GB/s | >= 47 GB/s | FAIL |
|
|
| ib_write_lat | 4.10 us | <= 2 us | FAIL |
|
|
| ib_read_lat | 16.00 us | <= 3.5 us | FAIL |
|
|
|
|
- **Failure reasons:**
|
|
- ib_write_bw bandwidth 0.13GB/s < 47GB/s
|
|
- ib_read_bw bandwidth 0.13GB/s < 47GB/s
|
|
- ib_write_lat latency 4.1us > 2us
|
|
- ib_read_lat latency 16.0us > 3.5us
|
|
**Overall: FAIL**
|
|
|
|
## Training Simulation
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Model | synthetic_transformer |
|
|
| Params | 1470.5M |
|
|
| Throughput | 52471 tokens/sec |
|
|
| Avg Step Time | 312.3 ms |
|
|
| Peak Memory | 27.3 GB |
|
|
| Final Loss | 0.0041 |
|
|
| Step Jitter | N/A% |
|
|
| Distributed Mode | N/A |
|
|
| Acceptance Gaps | missing passed, step_jitter_pct, distributed_mode, loss_finite |
|
|
| Verdict | UNVERIFIED (52471 tokens/sec; legacy result lacks explicit acceptance verdict) |
|
|
|
|
---
|
|
*Generated by GPU Test Suite v0.2.0* |