8 Commits

Author SHA1 Message Date
qinyusen
1c6ba4809a add: stress test (gpu-burn) and RDMA/IB test modules
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:57 +08:00
qinyusen
eac1438227 add: NCCL test module (nccl-tests integration + torchrun fallback)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:54 +08:00
qinyusen
65f10dd365 add: benchmark module (nvbandwidth integration + PyTorch compute)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:49 +08:00
qinyusen
b6dff76ef7 add: health check module (temperature, power, ECC, PCIe, system checks)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:44 +08:00
qinyusen
f5fdde5fc1 add: GPU information module (nvidia-smi wrapper, NVLink topology)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:40 +08:00
qinyusen
d4f46b6394 add: CLI entry point with interactive menu and argument parsing
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:35 +08:00
qinyusen
65cf7feee5 add: dependency installation script (nvbandwidth, nccl-tests, gpu-burn)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:32 +08:00
qinyusen
418dc70efb init: project scaffolding with README, config, and requirements
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-25 17:23:27 +08:00