qinyusen
|
52fe96f2f5
|
refactor: replace hardcoded H200 specs with dynamic GPU detection
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-05-06 19:31:51 +08:00 |
|
qinyusen
|
98e4977e28
|
add: GPU specs database with auto-detection (H100/H200/B200/B300)
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-05-06 19:31:44 +08:00 |
|
qinyusen
|
82cd4d5180
|
add: training simulation and report generation modules
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:24:01 +08:00 |
|
qinyusen
|
1c6ba4809a
|
add: stress test (gpu-burn) and RDMA/IB test modules
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:57 +08:00 |
|
qinyusen
|
eac1438227
|
add: NCCL test module (nccl-tests integration + torchrun fallback)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:54 +08:00 |
|
qinyusen
|
65f10dd365
|
add: benchmark module (nvbandwidth integration + PyTorch compute)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:49 +08:00 |
|
qinyusen
|
b6dff76ef7
|
add: health check module (temperature, power, ECC, PCIe, system checks)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:44 +08:00 |
|
qinyusen
|
f5fdde5fc1
|
add: GPU information module (nvidia-smi wrapper, NVLink topology)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:40 +08:00 |
|
qinyusen
|
418dc70efb
|
init: project scaffolding with README, config, and requirements
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
|
2026-04-25 17:23:27 +08:00 |
|