test_gpu_scripts

History

zulifeng ef2ca11c58 fix: resolve FP8 benchmark, NCCL parsing, and report None-value bugs

- benchmark.py: FP8 dtype now uses torch._scaled_mm() with scale tensors
  instead of torch.matmul() which does not support float8_e4m3fn on Hopper;
  fixes "addmm_cuda not implemented" error and enables FP8 TFLOPS measurement

- nccl_test.py: fix two bugs causing all-zero bandwidth results
  1. buffer size changed from -b 8 (8 bytes) to -b 8M -e 8G for meaningful load
  2. column parser corrected: parts[2] is dtype string not time value;
     now reads time=parts[5], algbw=parts[6], busbw=parts[7] per nccl-tests format

- report.py: replace .get(key, 0) with .get(key) or 0 at all bandwidth/stress
  fields to handle None values stored in result dicts (dict.get with default
  does not override an explicitly stored None)

- .gitignore: exclude .claude/settings.local.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-10 15:53:12 +08:00

__init__.py

init: project scaffolding with README, config, and requirements

2026-04-25 17:23:27 +08:00

benchmark.py

fix: resolve FP8 benchmark, NCCL parsing, and report None-value bugs

2026-05-10 15:53:12 +08:00

gpu_info.py

fix: resolve stress OOM, D2D efficiency calculation, NCCL execution failures

2026-05-07 18:09:22 +08:00

gpu_specs.py

refactor: remove hardcoding, fix AMP bug, unify English output