- benchmark.py: FP8 dtype now uses torch._scaled_mm() with scale tensors
instead of torch.matmul() which does not support float8_e4m3fn on Hopper;
fixes "addmm_cuda not implemented" error and enables FP8 TFLOPS measurement
- nccl_test.py: fix two bugs causing all-zero bandwidth results
1. buffer size changed from -b 8 (8 bytes) to -b 8M -e 8G for meaningful load
2. column parser corrected: parts[2] is dtype string not time value;
now reads time=parts[5], algbw=parts[6], busbw=parts[7] per nccl-tests format
- report.py: replace .get(key, 0) with .get(key) or 0 at all bandwidth/stress
fields to handle None values stored in result dicts (dict.get with default
does not override an explicitly stored None)
- .gitignore: exclude .claude/settings.local.json
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18 lines
158 B
Plaintext
18 lines
158 B
Plaintext
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
.pytest_cache/
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
reports/
|
|
*.egg
|
|
.eggs/
|
|
*.log
|
|
.DS_Store
|
|
.env
|
|
.venv/
|
|
venv/
|
|
.qoder/*
|
|
.claude/settings.local.json
|