- [Docker 一键安装脚本](#docker-一键安装脚本)
  - [支持平台](#支持平台)
  - [安装](#安装)
  - [测试](#测试)
- [BPU 模型性能测试工具](#bpu-模型性能测试工具)
  - [工具说明](#工具说明)
  - [目录结构](#目录结构)
  - [RDK S100 系列使用方法（Nash-e / Nash-m）](#rdk-s100-系列使用方法nash-e--nash-m)
  - [RDK X5 系列使用方法（Bayes-e）](#rdk-x5-系列使用方法bayes-e)
  - [交互式进入容器](#交互式进入容器)
  - [task.json 协议说明](#taskjson-协议说明)
  - [输出结果说明](#输出结果说明)
- [CoreMark-PRO CPU 性能测试工具](#coremark-pro-cpu-性能测试工具)
  - [工具说明](#工具说明-1)
  - [目录结构](#目录结构-1)
  - [使用方法](#使用方法)
  - [输出结果说明](#输出结果说明-1)


---

## Docker 一键安装脚本

### 支持平台

| 硬件平台 | BPU 架构 | RDK OS 版本要求 | MiniBoot 版本要求 |
|----------|----------|-----------------|-------------------|
| RDK S100 / RDK S100P | Nash-e / Nash-m | >= 4.0.4-Beta | >= 4.0.4-20251015222314 |
| RDK X5 / RDK X5 Module | Bayes-e | >= 3.3.3 | >= Sep-03-2025 |


### 安装

脚本路径 `RDK_Docker_Tools/install_docker/install_docker.sh`，该脚本会在 RDK 板端一键安装 Docker 软件，配置好网络相关配置。

1. 安装到使用过程无需重启。
2. 安装过程中 eth0 网络配置不会做任何修改，ssh 不会挂掉。
3. 请使用 root 用户安装。
4. 请确保板卡能正常访问互联网，`/` 目录有 2GB 可用空间。

```bash
bash install_docker/install_docker.sh
```

预期输出

```text
============================================================
   Docker 一键安装脚本 (Ubuntu 22.04 ARM64/x86_64)
============================================================

[INFO]  检查系统架构...
[OK]    架构: aarch64 (ARM64) ✓
[INFO]  检查运行权限...
[OK]    当前为 root 用户 ✓
[INFO]  检查 eth0 网络接口...
[OK]    eth0 状态: UP, IP: 10.112.10.98/24
[INFO]  注意: 本脚本不会修改 eth0 的任何配置
[INFO]  检查 Docker 是否已安装...
[INFO]  检查依赖命令...
...
[OK]    Docker 安装完毕 ✓
[INFO]  配置 Docker 镜像加速源...
[OK]    镜像加速配置完毕 ✓
[INFO]  启动 Docker 服务...
[OK]    Docker 服务已通过 systemd 启动 ✓
...
============================================================
[OK]    Docker 安装完成!
============================================================
```


### 测试

```bash
bash install_docker/test_docker.sh
```

预期输出

```text
============================================================
   Docker 功能测试脚本
============================================================

[INFO]  检查 Docker 服务状态...
[OK]    Docker 服务正常 ✓

------------------------------------------------------------
[INFO]  [1/5] 测试 docker pull...
------------------------------------------------------------
[OK]    docker pull 成功 ✓

------------------------------------------------------------
[INFO]  [2/5] 测试 docker run...
------------------------------------------------------------
[OK]    docker run 成功 ✓

------------------------------------------------------------
[INFO]  [3/5] 测试 docker commit...
------------------------------------------------------------
[OK]    docker commit 成功 -> hello-world-committed:test ✓

------------------------------------------------------------
[INFO]  [4/5] 测试 docker export...
------------------------------------------------------------
[OK]    docker export 成功 ✓

------------------------------------------------------------
[INFO]  [5/5] 测试 docker save & load...
------------------------------------------------------------
[OK]    docker save 成功 ✓
[OK]    docker load 成功, 镜像已恢复 ✓

============================================================
[OK]    全部测试通过!
============================================================
```


---

## BPU 模型性能测试工具

### 工具说明

基于 `hrt_model_exec perf` 命令，对指定的 `.hbm` 或 `.bin` 模型文件，分别在线程数 1、2、3、4 下进行性能测试，输出每个线程数对应的平均延迟和 FPS，并将结果保存为 JSON 文件。

| 硬件平台 | BPU 架构 | 镜像名称 |
|----------|----------|----------|
| RDK S100 / RDK S100P | Nash-e / Nash-m | `hrt_perf_nashem:v3.7.3` |
| RDK X5 / RDK X5 Module | Bayes-e | `hrt_perf_bayese:v1.24.5` |


### 目录结构

```
bpu_model_perf_images/
├── Dockerfile.nashem_perf          # RDK S100 (Nash-e/Nash-m) Dockerfile
├── Dockerfile.bayese_perf          # RDK X5 (Bayes-e) Dockerfile
├── docker_build_nashem.sh          # RDK S100 镜像构建脚本
├── docker_build_bayese.sh          # RDK X5 镜像构建脚本
├── docker_run_nashem_perf.sh       # RDK S100 自动化 perf 运行脚本
├── docker_run_bayese_perf.sh       # RDK X5 自动化 perf 运行脚本
├── docker_test_nashem.sh           # RDK S100 交互式进入容器脚本
├── docker_test_bayese.sh           # RDK X5 交互式进入容器脚本
├── workspace/
│   ├── perf.py                     # 性能测试主脚本（两平台共用）
│   └── entrypoint.sh               # 容器入口脚本（两平台共用）
├── example_fs/                     # RDK S100 示例输入输出
│   ├── input/
│   │   ├── task.json               # 输入协议文件
│   │   └── *.hbm                   # 模型文件（放在此处）
│   └── output/
│       └── result.json             # 运行后自动生成
└── example_fs_bayese/              # RDK X5 示例输入输出
    ├── input/
    │   ├── task.json               # 输入协议文件
    │   └── *.hbm / *.bin           # 模型文件（放在此处）
    └── output/
        └── result.json             # 运行后自动生成
```
### 预打包版本

```bash
# RDK X5 / RDK X5 module
docker pull ccr-29eug8s3-pub.cnc.bj.baidubce.com/public/hrt_perf_bayese:v1.24.5

# RDK S100 / RDK S100P
docker pull ccr-29eug8s3-pub.cnc.bj.baidubce.com/public/hrt_perf_nashem:v3.7.3
```

### RDK S100 系列使用方法（Nash-e / Nash-m）

**第一步：构建镜像**（只需构建一次，之后可直接交付镜像）

```bash
cd bpu_model_perf_images
bash docker_build_nashem.sh
```

构建脚本会自动从板子上找到 `hrt_model_exec`（`which hrt_model_exec`）并打包进镜像，无需手动操作。

**第二步：准备输入**

将模型文件（`.hbm`）放入 `example_fs/input/`，并编辑 `example_fs/input/task.json`：

```json
{
  "model_relative_path": "your_model.hbm",
  "frame_count": 200
}
```

**第三步：运行**

```bash
bash docker_run_nashem_perf.sh
```

运行脚本将 `example_fs/input` 挂载为容器内 `/workspace/input`，`example_fs/output` 挂载为 `/workspace/output`。

**预期输出**

```text
============================================================
task.json content:
{
  "model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
  "frame_count": 200
}
============================================================
[OK] 1 model(s) validated, frame_count=200

[Benchmarking] yolo11n_detect_nashe_640x640_nv12.hbm (/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm)
  thread_num=1 ... FPS=625.20  avg_latency=1.538ms
  thread_num=2 ... FPS=1133.43  avg_latency=1.703ms
  thread_num=3 ... FPS=1171.23  avg_latency=2.482ms
  thread_num=4 ... FPS=1167.98  avg_latency=3.322ms

================================================================================
+---------------------------------------+-----------+------------------+------------+
| Model                                 | Threads   | Avg Latency(ms)  | FPS        |
+---------------------------------------+-----------+------------------+------------+
| yolo11n_detect_nashe_640x640_nv12.hbm | 1         | 1.538            | 625.20     |
|                                       | 2         | 1.703            | 1133.43    |
|                                       | 3         | 2.482            | 1171.23    |
|                                       | 4         | 3.322            | 1167.98    |
+---------------------------------------+-----------+------------------+------------+

Results saved to: /workspace/output/result.json
```


### RDK X5 系列使用方法（Bayes-e）

**第一步：构建镜像**

```bash
cd bpu_model_perf_images
bash docker_build_bayese.sh
```

**第二步：准备输入**

将模型文件（`.hbm` 或 `.bin`）放入 `example_fs_bayese/input/`，并编辑 `example_fs_bayese/input/task.json`：

```json
{
  "model_relative_path": "your_model.hbm",
  "frame_count": 200
}
```

**第三步：运行**

```bash
bash docker_run_bayese_perf.sh
```


### 交互式进入容器

镜像内置了 `hrt_model_exec` 等 BPU 工具，可以交互式进入容器手动调试模型。

**RDK S100（Nash-e / Nash-m）**

```bash
cd bpu_model_perf_images
bash docker_test_nashem.sh
```

**RDK X5（Bayes-e）**

```bash
cd bpu_model_perf_images
bash docker_test_bayese.sh
```

进入容器后可直接使用 `hrt_model_exec`，例如：

```bash
# 查看帮助
hrt_model_exec --help

# 手动跑 perf
hrt_model_exec perf --model_file /workspace/input/xxx.hbm --thread_num 2 --frame_count 200

# 手动跑推理
hrt_model_exec infer --model_file /workspace/input/xxx.hbm --input_file input.jpg
```

> 容器以 `--privileged` 模式启动，宿主机的 `/usr/hobot`、`/opt/hobot` 等目录均已挂载，库文件与板子上完全一致。退出容器后自动删除（`--rm`）。


### task.json 协议说明

支持两种格式：

**单模型**

| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `model_relative_path` | string | 是 | 模型文件名，相对于 input 目录，后缀必须为 `.hbm` 或 `.bin` |
| `frame_count` | int | 否 | 测试帧数，默认 200 |

```json
{
  "model_relative_path": "model.hbm",
  "frame_count": 200
}
```

**多模型**

| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `models` | list | 是 | 模型列表 |
| `models[].path` | string | 是 | 模型文件名，相对于 input 目录 |
| `models[].name` | string | 否 | 显示名称，默认取文件名 |
| `frame_count` | int | 否 | 测试帧数，默认 200 |

```json
{
  "models": [
    {"name": "model_a", "path": "model_a.hbm"},
    {"name": "model_b", "path": "model_b.bin"}
  ],
  "frame_count": 200
}
```

> 注意：`model_relative_path` 和 `models` 不能同时出现。


### 输出结果说明

结果保存在 `output/result.json`，结构如下：

```json
[
  {
    "model_name": "model.hbm",
    "model_path": "/workspace/input/model.hbm",
    "perf_results": [
      {
        "thread_num": 1,
        "frame_count": 200,
        "run_time_ms": 320.18,
        "total_latency_ms": 307.53,
        "avg_latency_ms": 1.538,
        "fps": 625.197,
        "raw_output": "...",
        "returncode": 0
      }
    ]
  }
]
```

---

## CoreMark-PRO CPU 性能测试工具

### 工具说明

基于 CoreMark-PRO 标准 CPU 性能测试套件，对 ARM 处理器进行全面性能测试。测试包含 9 个典型工作负载，自动检测 CPU 核心数并在 1 到最大核心数之间进行多线程性能测试，最终输出 CoreMark-PRO 综合得分。

| 硬件平台 | 架构 | 镜像名称 |
|----------|------|----------|
| 所有 ARM 平台 | ARMv8 / aarch64 | `coremark_pro_rdk_arm:latest` |

CoreMark-PRO 包含以下 9 个工作负载：

1. **cjpeg-rose7-preset** - JPEG 压缩
2. **core** - 增强版 CoreMark
3. **linear_alg-mid-100x100-sp** - 线性代数（单精度浮点）
4. **loops-all-mid-10k-sp** - Livermore Loops（单精度浮点）
5. **nnet_test** - 神经网络
6. **parser-125k** - XML 解析
7. **radix2-big-64k** - FFT（双精度浮点）
8. **sha-test** - SHA-256 哈希
9. **zip-test** - ZIP 压缩

### 目录结构

```
coremark_images/
├── Dockerfile                      # 镜像构建文件
├── docker_build.sh                 # 构建脚本
├── docker_run.sh                   # 运行脚本
├── docker_test.sh                  # 测试脚本
├── workspace/
│   ├── coremark_perf.py            # 性能测试主脚本
│   └── entrypoint.sh               # 容器入口脚本
├── coremark-pro/                   # CoreMark-PRO 源代码
│   └── builds/linux64/gcc64/bin/   # 预编译的二进制文件
└── example_fs/
    └── output/                     # 默认输出目录
        └── result.json             # 运行后自动生成
```

### 使用方法

**第一步：构建镜像**（只需构建一次）

```bash
cd coremark_images
bash docker_build.sh
```

**第二步：运行**

```bash
bash docker_run.sh
```

运行脚本将 `example_fs/output` 挂载为容器内 `/workspace/output`。

**预期输出**

```text
==========================================
CoreMark-PRO CPU Performance Benchmark
==========================================

==========================================================================================
System Information
==========================================================================================
Hostname:        xxxxxxxx
Platform:        Linux-6.1.83-aarch64-with-glibc2.35
Architecture:    aarch64
CPU Model:       aarch64
CPU Count:       8 logical, 8 physical
CPU Frequency:   1800 MHz
Memory Total:    5768 MB
Memory Used:     48.2%
Timestamp:       2026-03-20 11:37:11
==========================================================================================

[Benchmark] Running CoreMark-PRO with thread counts: [1, 2, 3, 4, 5, 6, 7, 8]
[Benchmark] Total workloads: 9

--- Running with 1 thread(s) ---
  [1 threads] cjpeg-rose7-preset ... 59.88 iter/s
  [1 threads] core ... 0.46 iter/s
  [1 threads] linear_alg-mid-100x100-sp ... 24.74 iter/s
  ...

==========================================================================================
CoreMark-PRO Workload Results
==========================================================================================
Workload                       T=1         T=2         T=3         T=4     ...     Scaling
------------------------------------------------------------------------------------------
cjpeg-rose7-preset           59.88      117.65      169.49      232.56    ...        7.26
core                          0.46        0.91        1.37        1.82    ...        6.07
...
==========================================================================================

==========================================================================================
CoreMark-PRO Score Summary
==========================================================================================
Threads      CoreMark-PRO Score
------------------------------------------------------------------------------------------
1                       1390.51
2                       2684.16
...
8                       7027.42
------------------------------------------------------------------------------------------
Final CoreMark-PRO Score                    7027.42
==========================================================================================

[OK] Results saved to: /workspace/output/result.json
[OK] CoreMark-PRO Score: 7027.42
```

### 输出结果说明

结果保存在 `example_fs/output/result.json`，结构如下：

```json
{
  "system_info": {
    "cpu": {
      "cpu_count": 8,
      "cpu_count_physical": 8,
      "cpu_freq_mhz": 1800.0,
      "cpu_model": "aarch64",
      "architecture": "aarch64"
    },
    "memory": {
      "total_mb": 5768,
      "available_mb": 2985,
      "percent_used": 48.2
    },
    "system": {
      "hostname": "rdk-board",
      "platform": "Linux-6.1.83-aarch64",
      "timestamp": "2026-03-20 11:37:11"
    }
  },
  "benchmark_config": {
    "workloads": ["cjpeg-rose7-preset", "core", ...],
    "thread_counts": [1, 2, 3, 4, 5, 6, 7, 8],
    "max_threads": 8
  },
  "results": [
    {
      "workload": "cjpeg-rose7-preset",
      "thread_num": 1,
      "iter_per_sec": 59.88,
      "returncode": 0,
      "elapsed_time_sec": 0.17
    }
  ],
  "coremark_pro_score": 7027.42
}
```

| 字段 | 说明 |
|------|------|
| `system_info` | 系统信息（CPU、内存、平台等） |
| `benchmark_config` | 测试配置（workload 列表、线程数等） |
| `results` | 每个 workload 在不同线程数下的性能数据 |
| `coremark_pro_score` | 最终 CoreMark-PRO 综合得分 |