Key changes: - stress_test: use torch.cuda.mem_get_info() for free memory instead of total, allocate 40% to avoid OOM when other processes occupy GPU memory - benchmark: fix D2D efficiency by comparing to NVLink per-direction bandwidth (not HBM), add H2D/D2H efficiency against PCIe peak - nccl_test: implement direct binary → mpirun → torchrun fallback chain, fix min_bw None bug when YAML value is empty - report: update memory section to use per-metric peak fields - install_deps.sh: add NCCL compatibility detection, enhance CUDA version detection with CUDA_HOME/standard paths, improve _map_cuda_tag logging - gpu_info: parse CUDA version from nvidia-smi header (query field removed in newer drivers) - health_check: parse throttle_reasons bitmask properly, ignore gpu_idle bit - gpu_tester: fix suite summary to exclude metadata keys from pass count 🤖 Generated with [Qoder][https://qoder.com]
17 lines
130 B
Plaintext
17 lines
130 B
Plaintext
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
.pytest_cache/
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
reports/
|
|
*.egg
|
|
.eggs/
|
|
*.log
|
|
.DS_Store
|
|
.env
|
|
.venv/
|
|
venv/
|
|
.qoder/*
|