From 7ec2da18bc66f5199b97863ea8e4dea01dd801f0 Mon Sep 17 00:00:00 2001
From: cs <shi.chen@robotics.cc>
Date: Tue, 26 May 2026 00:15:48 +0800
Subject: [PATCH] Clean report whitespace

---
 docs/multinode_nccl_concepts.md                | 1 -
 reports_fp8_path_comparison_20260525.md        | 7 +++----
 reports_gpu_Test_formal_20260524.md            | 1 -
 reports_gpu_Test_pdf.css                       | 1 -
 reports_test_all_latest_summary_cn_20260523.md | 2 +-
 5 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/docs/multinode_nccl_concepts.md b/docs/multinode_nccl_concepts.md
index 1c6039d..52d9b87 100644
--- a/docs/multinode_nccl_concepts.md
+++ b/docs/multinode_nccl_concepts.md
@@ -359,4 +359,3 @@ flowchart TD
 ```
 
 因此，多机多卡测试不是一个命令，而是一条验证链路。
-
diff --git a/reports_fp8_path_comparison_20260525.md b/reports_fp8_path_comparison_20260525.md
index c245b15..6c5d9cf 100644
--- a/reports_fp8_path_comparison_20260525.md
+++ b/reports_fp8_path_comparison_20260525.md
@@ -1,8 +1,8 @@
 # FP8 GEMM 路径对比测试报告
 
-测试日期：2026-05-25  
-测试节点：aikubeworker0012、aikubeworker0016  
-测试 GPU：NVIDIA H100 80GB HBM3  
+测试日期：2026-05-25
+测试节点：aikubeworker0012、aikubeworker0016
+测试 GPU：NVIDIA H100 80GB HBM3
 测试目标：对比同一 FP8 GEMM 规模下 PyTorch eager、CUDA Graph、Transformer Engine 和 direct cuBLASLt 的性能差异。
 
 ## 一、测试结论
@@ -166,4 +166,3 @@ E 路径 cuBLASLt 算法信息：
 | `/Users/d-robotics/lab/test_gpu_scripts/reports_fp8_paths_combined_aikubeworker0012_20260525_045408.json` | aikubeworker0012 A-E 原始结果 |
 | `/Users/d-robotics/lab/test_gpu_scripts/reports_fp8_paths_combined_aikubeworker0016_20260525_050048.json` | aikubeworker0016 A-E 原始结果 |
 | `/Users/d-robotics/lab/test_gpu_scripts/reports_fp8_path_comparison_20260525.md` | 本中文汇总报告 |
-
diff --git a/reports_gpu_Test_formal_20260524.md b/reports_gpu_Test_formal_20260524.md
index 65969b2..49e2695 100644
--- a/reports_gpu_Test_formal_20260524.md
+++ b/reports_gpu_Test_formal_20260524.md
@@ -120,4 +120,3 @@ NCCL 的 `busbw` 是 collective 通信的逻辑折算带宽，不等同于单条
 3. 单机 8 卡 NCCL 通信在两台节点上结果接近，未观察到明显节点间异常差异。
 4. 多机 2x8 NCCL 正确性通过，跨节点通信功能正常。
 5. 当前多机通信结果应按 4x400Gbps IB rail 环境解释；若后续需要对齐 8x400Gbps 环境，应先确认 rail 数量、NCCL net plugin / SHARP、交换网络策略等配置一致。
-
diff --git a/reports_gpu_Test_pdf.css b/reports_gpu_Test_pdf.css
index 8ef6d39..9a44015 100644
--- a/reports_gpu_Test_pdf.css
+++ b/reports_gpu_Test_pdf.css
@@ -99,4 +99,3 @@ ol {
 li {
   margin: 3px 0;
 }
-
diff --git a/reports_test_all_latest_summary_cn_20260523.md b/reports_test_all_latest_summary_cn_20260523.md
index 9ef9449..87f4eab 100644
--- a/reports_test_all_latest_summary_cn_20260523.md
+++ b/reports_test_all_latest_summary_cn_20260523.md
@@ -1,6 +1,6 @@
 # H100 单节点 test all 中文汇总
 
-生成时间：2026-05-23  
+生成时间：2026-05-23
 测试范围：`aikubeworker0012`、`aikubeworker0016` 单节点 `python gpu_tester.py --test all --report --format md`
 
 原始报告：