test_gpu_scripts

Author	SHA1	Message	Date
zulifeng	375d439abb	feat: 新增 H20 支持、优化算力测试精度并修复多项稳定性问题 - gpu_specs: 新增 H20/H20-3e (中国合规版 H200) 规格定义，并修复 GPU 名称匹配顺序，避免 "H200" 被 "H20" 子串误匹配 - benchmark(compute): 引入 L2 cache 规避的 matrix pool 轮换 + 可选 torch.compile(max-autotune)，FP8 增加 _scaled_mm 探测，显著提升 FP16/BF16/FP8 实测吞吐准确性 - benchmark(memory): nvbandwidth 增加 --disableAffinity 规避 fabricmanager NVML 不兼容；全 0 结果时自动回退到 PyTorch； D2D 平均值排除对角线零值 - nccl: 各通信操作 (AllReduce/AllToAll/Broadcast 等) 使用独立带宽阈值比例，避免 AllToAll 误报 WARN - rdma: 仅按 link_layer=InfiniBand 过滤端口，无 IB 硬件或全 DOWN 时直接 SKIP 而非报错 - stress: 计算矩阵尺寸封顶 4096，并改为先并发派发再统一同步，修复 8 卡串行执行导致 duration 严重超时的问题 - report: 兼容 RDMA SKIP 状态与 PyTorch 回退场景的 Memory 判定，避免回退结果被误判为 FAIL - config: 新增 benchmark.compute.use_compile 开关 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-12 21:41:46 +08:00
qinyusen	fefef8e03b	refactor: remove hardcoding, fix AMP bug, unify English output - Fix AMP autocast: bf16 now uses torch.amp.autocast (was skipped) - Fix NCCL threshold: unknown GPU gets 10 GB/s floor instead of 0 - Fix PCIe health check: use specs-driven pcie_gen, not hardcoded Gen4 - Remove hardcoded GPU lists: dynamic banner, CLI choices, version - Unknown GPU efficiency displays N/A instead of 0% - Unify all console output to English (stress_test, gpu_tester) - Use importlib.metadata for runtime version resolution - Remove dir="/tmp" from tempfile (use system default) 🤖 Generated with [Qoder][https://qoder.com]	2026-05-07 21:32:35 +08:00
qinyusen	f2158f6cd3	fix: resolve stress OOM, D2D efficiency calculation, NCCL execution failures Key changes: - stress_test: use torch.cuda.mem_get_info() for free memory instead of total, allocate 40% to avoid OOM when other processes occupy GPU memory - benchmark: fix D2D efficiency by comparing to NVLink per-direction bandwidth (not HBM), add H2D/D2H efficiency against PCIe peak - nccl_test: implement direct binary → mpirun → torchrun fallback chain, fix min_bw None bug when YAML value is empty - report: update memory section to use per-metric peak fields - install_deps.sh: add NCCL compatibility detection, enhance CUDA version detection with CUDA_HOME/standard paths, improve _map_cuda_tag logging - gpu_info: parse CUDA version from nvidia-smi header (query field removed in newer drivers) - health_check: parse throttle_reasons bitmask properly, ignore gpu_idle bit - gpu_tester: fix suite summary to exclude metadata keys from pass count 🤖 Generated with [Qoder][https://qoder.com]	2026-05-07 18:09:22 +08:00
qinyusen	3e967dd34a	feat: add Ampere (A100/A800) support and generalize project naming - Expand GPU specs database to include A100/A800 with Ampere architecture parameters - Rename h200_tester.py to gpu_tester.py for architecture-neutral branding - Add driver/CUDA compatibility validation per GPU generation - Enhance report module with HTML and Markdown output formats - Improve nvbandwidth binary discovery (system paths, DCGM locations) - Add pyproject.toml with uv for dependency management - Update install_deps.sh, configs, and README for multi-architecture support 🤖 Generated with [Qoder][https://qoder.com]	2026-05-07 01:02:28 +08:00
qinyusen	1c6ba4809a	add: stress test (gpu-burn) and RDMA/IB test modules Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-04-25 17:23:57 +08:00

5 Commits