update perf docker

2026-03-18 17:23:23 +08:00 · 2026-03-18 17:23:23 +08:00 · a7bda96863
commit a7bda96863
parent 03e2d3520d
17 changed files with 775 additions and 189 deletions
--- a/README.md
+++ b/README.md
@ -1,29 +1,41 @@

- [Docker一键安装脚本](#docker一键安装脚本)
+- [Docker 一键安装脚本](#docker-一键安装脚本)
  - [支持平台](#支持平台)
  - [安装](#安装)
  - [测试](#测试)
+- [BPU 模型性能测试工具](#bpu-模型性能测试工具)
+  - [工具说明](#工具说明)
+  - [目录结构](#目录结构)
+  - [RDK S100 系列使用方法（Nash-e / Nash-m）](#rdk-s100-系列使用方法nash-e--nash-m)
+  - [RDK X5 系列使用方法（Bayes-e）](#rdk-x5-系列使用方法bayes-e)
+  - [交互式进入容器](#交互式进入容器)
+  - [task.json 协议说明](#taskjson-协议说明)
+  - [输出结果说明](#输出结果说明)


+---

 ## Docker 一键安装脚本

 ### 支持平台
-1. RDK X5 / RDK X5 Module (RDK OS版本 >= 3.3.3, MiniBoot版本 >= Sep-03-2025)
-2. RDK S100 / RDK S100P   (RDK OS版本 >= 4.0.4-Beta, Miniboot版本 >= 4.0.4-20251015222314)
+
+| 硬件平台 | BPU 架构 | RDK OS 版本要求 | MiniBoot 版本要求 |
+|----------|----------|-----------------|-------------------|
+| RDK S100 / RDK S100P | Nash-e / Nash-m | >= 4.0.4-Beta | >= 4.0.4-20251015222314 |
+| RDK X5 / RDK X5 Module | Bayes-e | >= 3.3.3 | >= Sep-03-2025 |


 ### 安装

-脚本路径`RDK_Docker_Tools/install_docker.sh`, 该脚本会在RDK板端一键安装docker软件, 配置好网络相关配置. 
+脚本路径 `RDK_Docker_Tools/install_docker/install_docker.sh`，该脚本会在 RDK 板端一键安装 Docker 软件，配置好网络相关配置。

-1. 安装到使用过程无需重启.
-2. 安装过程中eth0网络配置不会做做任何修改, ssh不会挂掉.
-3. 请使用root用户安装.
-4. 请确保板卡能正常访问互联网, `/`目录有2GB可用空间. 
+1. 安装到使用过程无需重启。
+2. 安装过程中 eth0 网络配置不会做任何修改，ssh 不会挂掉。
+3. 请使用 root 用户安装。
+4. 请确保板卡能正常访问互联网，`/` 目录有 2GB 可用空间。

 ```bash
-bash install_docker.sh
+bash install_docker/install_docker.sh
 ```

 预期输出
@ -42,131 +54,23 @@ bash install_docker.sh
 [INFO]  注意: 本脚本不会修改 eth0 的任何配置
 [INFO]  检查 Docker 是否已安装...
 [INFO]  检查依赖命令...
-[WARN]  以下命令缺失: curl, 尝试安装...
-Reading package lists... Done
-Building dependency tree... Done
-Reading state information... Done
-The following additional packages will be installed:
-  libcurl4 libcurl4-openssl-dev
-Suggested packages:
-  libcurl4-doc libidn11-dev librtmp-dev libssh2-1-dev
-The following NEW packages will be installed:
-  curl
-The following packages will be upgraded:
-  libcurl4 libcurl4-openssl-dev
-2 upgraded, 1 newly installed, 0 to remove and 414 not upgraded.
-Need to get 190 kB/867 kB of archives.
-After this operation, 439 kB of additional disk space will be used.
-Get:1 http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports jammy-updates/main arm64 curl arm64 7.81.0-1ubuntu1.23 [190 kB]
-Fetched 190 kB in 0s (475 kB/s)
-perl: warning: Setting locale failed.
-perl: warning: Please check that your locale settings:
-        LANGUAGE = (unset),
-        LC_ALL = (unset),
-        LANG = "en_US.UTF-8"
-    are supported and installed on your system.
-perl: warning: Falling back to the standard locale ("C").
-locale: Cannot set LC_CTYPE to default locale: No such file or directory
-locale: Cannot set LC_MESSAGES to default locale: No such file or directory
-locale: Cannot set LC_ALL to default locale: No such file or directory
-(Reading database ... 219577 files and directories currently installed.)
-Preparing to unpack .../libcurl4-openssl-dev_7.81.0-1ubuntu1.23_arm64.deb ...
-Unpacking libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
-Preparing to unpack .../libcurl4_7.81.0-1ubuntu1.23_arm64.deb ...
-Unpacking libcurl4:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
-Selecting previously unselected package curl.
-Preparing to unpack .../curl_7.81.0-1ubuntu1.23_arm64.deb ...
-Unpacking curl (7.81.0-1ubuntu1.23) ...
-Setting up libcurl4:arm64 (7.81.0-1ubuntu1.23) ...
-Setting up curl (7.81.0-1ubuntu1.23) ...
-Setting up libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) ...
-Processing triggers for man-db (2.10.2-1) ...
-Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
-[OK]    依赖命令检查完毕 ✓
-[INFO]  检查网络连通性...
-[OK]    可访问 Ubuntu ports 源 (ports.ubuntu.com) ✓
-[INFO]  更新 apt 软件包缓存...
-[OK]    apt 缓存更新完毕 ✓
-[INFO]  安装必要依赖包...
-[INFO]  安装缺失依赖: apt-transport-https
 ...
-[OK]    依赖包就绪 ✓
-[INFO]  开始安装 Docker (docker.io)...
-...
-Adding group `docker' (GID 135) ...
-Done.
-Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /lib/systemd/system/docker.service.
-Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket.
-Job for docker.service failed because the control process exited with error code.
-See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
-invoke-rc.d: initscript docker, action "start" failed.
-● docker.service - Docker Application Container Engine
-     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
-     Active: activating (auto-restart) (Result: exit-code) since Wed 2026-03-18 11:12:24 CST; 7ms ago
-TriggeredBy: ● docker.socket
-       Docs: https://docs.docker.com
-    Process: 231291 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
-   Main PID: 231291 (code=exited, status=1/FAILURE)
-Setting up dnsmasq-base (2.90-0ubuntu0.22.04.1) ...
-Setting up docker-buildx (0.21.3-0ubuntu1~22.04.1) ...
-Setting up ubuntu-fan (0.12.16) ...
-Created symlink /etc/systemd/system/multi-user.target.wants/ubuntu-fan.service → /lib/systemd/system/ubuntu-fan.service.
-Processing triggers for man-db (2.10.2-1) ...
-Processing triggers for dbus (1.12.20-2ubuntu4.1) ...
-Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
 [OK]    Docker 安装完毕 ✓
-[INFO]  检查 iptables 后端兼容性...
-update-alternatives: using /usr/sbin/iptables-legacy to provide /usr/sbin/iptables (iptables) in manual mode
-update-alternatives: using /usr/sbin/ip6tables-legacy to provide /usr/sbin/ip6tables (ip6tables) in manual mode
-[OK]    已切换到 iptables-legacy 后端 ✓
 [INFO]  配置 Docker 镜像加速源...
-[WARN]  内核缺少 iptable_raw 模块, 将启用 allow-direct-routing 绕过 raw 表限制
 [OK]    镜像加速配置完毕 ✓
 [INFO]  启动 Docker 服务...
 [OK]    Docker 服务已通过 systemd 启动 ✓
-[INFO]  检查 Docker Hub 及镜像源 DNS 解析...
-[OK]    DNS hosts 修复完毕 ✓
-
-[INFO]  验证 Docker 安装...
-[OK]    Docker 版本: Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1
-[OK]    dockerd 进程运行中 ✓
-[OK]    Docker socket 存在: /var/run/docker.sock ✓
-[OK]    docker info 执行成功 ✓
-
-[INFO]  验证核心命令可用性...
-[OK]      docker pull --help ✓
-[OK]      docker push --help ✓
-[OK]      docker commit --help ✓
-[OK]      docker export --help ✓
-[OK]      docker load --help ✓
-[OK]      docker images ✓
-[OK]      docker ps ✓
-
+...
 ============================================================
 [OK]    Docker 安装完成!
 ============================================================
-
-  常用命令速查:
-    docker pull <镜像>             # 拉取镜像
-    docker push <镜像>             # 推送镜像
-    docker commit <容器> <镜像>    # 提交容器为镜像
-    docker export <容器> -o x.tar  # 导出容器
-    docker load -i <文件.tar>      # 加载镜像
-    docker images                  # 查看本地镜像
-    docker ps -a                   # 查看所有容器
-
-  注意事项:
-    - eth0 网络配置未做任何修改
-    - 本次安装无需重启系统
-    - 镜像加速配置: /etc/docker/daemon.json
-    - Docker 日志: journalctl -u docker 或 /var/log/dockerd.log
 ```


 ### 测试

 ```bash
-bash test_docker.sh
+bash install_docker/test_docker.sh
 ```

 预期输出
@ -182,99 +86,263 @@ bash test_docker.sh
 ------------------------------------------------------------
 [INFO]  [1/5] 测试 docker pull...
 ------------------------------------------------------------
-[INFO]  尝试拉取: hello-world
-Using default tag: latest
-latest: Pulling from library/hello-world
-198f93fd5094: Pull complete 
-Digest: sha256:85404b3c53951c3ff5d40de0972b1bb21fafa2e8daa235355baf44f33db9dbdd
-Status: Downloaded newer image for hello-world:latest
-docker.io/library/hello-world:latest
 [OK]    docker pull 成功 ✓

 ------------------------------------------------------------
 [INFO]  [2/5] 测试 docker run...
 ------------------------------------------------------------
-
-Hello from Docker!
-This message shows that your installation appears to be working correctly.
-
-To generate this message, Docker took the following steps:
- 1. The Docker client contacted the Docker daemon.
- 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
-    (arm64v8)
- 3. The Docker daemon created a new container from that image which runs the
-    executable that produces the output you are currently reading.
- 4. The Docker daemon streamed that output to the Docker client, which sent it
-    to your terminal.
-
-To try something more ambitious, you can run an Ubuntu container with:
- $ docker run -it ubuntu bash
-
-Share images, automate workflows, and more with a free Docker ID:
- https://hub.docker.com/
-
-For more examples and ideas, visit:
- https://docs.docker.com/get-started/
-
 [OK]    docker run 成功 ✓

 ------------------------------------------------------------
 [INFO]  [3/5] 测试 docker commit...
 ------------------------------------------------------------
-sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
 [OK]    docker commit 成功 -> hello-world-committed:test ✓
-[OK]    commit 镜像验证通过 ✓

 ------------------------------------------------------------
 [INFO]  [4/5] 测试 docker export...
 ------------------------------------------------------------
-[OK]    docker export 成功 -> /tmp/docker_test_232981/hello-world-export.tar (16K) ✓
+[OK]    docker export 成功 ✓

 ------------------------------------------------------------
 [INFO]  [5/5] 测试 docker save & load...
 ------------------------------------------------------------
-[OK]    docker save 成功 -> /tmp/docker_test_232981/hello-world-load.tar (20K) ✓
-Untagged: hello-world-committed:test
-Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
-[INFO]  已删除本地镜像 hello-world-committed:test, 准备 load...
-Loaded image: hello-world-committed:test
+[OK]    docker save 成功 ✓
 [OK]    docker load 成功, 镜像已恢复 ✓

------------------------------------------------------------
-[INFO]  [附] docker push 说明
------------------------------------------------------------
-[WARN]  docker push 需要登录镜像仓库, 本脚本不自动执行
-  推送到 Docker Hub:
-    docker login
-    docker tag hello-world-committed:test <你的用户名>/hello-world:test
-    docker push <你的用户名>/hello-world:test
-
-  推送到私有仓库:
-    docker tag hello-world-committed:test <仓库地址>/hello-world:test
-    docker push <仓库地址>/hello-world:test
-
------------------------------------------------------------
-[INFO]  清理测试资源...
------------------------------------------------------------
-docker_test_232981
-[OK]    容器已清理 ✓
-Untagged: hello-world-committed:test
-Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
-[OK]    commit 镜像已清理 ✓
-[OK]    临时文件已清理 ✓
-
 ============================================================
 [OK]    全部测试通过!
 ============================================================
-
-  测试结果汇总:
-    ✓ docker pull   - 镜像拉取
-    ✓ docker run    - 容器运行
-    ✓ docker commit - 容器提交为镜像
-    ✓ docker export - 容器导出为 tar
-    ✓ docker save   - 镜像保存为 tar
-    ✓ docker load   - 从 tar 加载镜像
-    - docker push   - 需手动配置仓库后执行
 ```


+---
+
+## BPU 模型性能测试工具
+
+### 工具说明
+
+基于 `hrt_model_exec perf` 命令，对指定的 `.hbm` 或 `.bin` 模型文件，分别在线程数 1、2、3、4 下进行性能测试，输出每个线程数对应的平均延迟和 FPS，并将结果保存为 JSON 文件。
+
+| 硬件平台 | BPU 架构 | 镜像名称 |
+|----------|----------|----------|
+| RDK S100 / RDK S100P | Nash-e / Nash-m | `hrt_perf_nashem:v3.7.3` |
+| RDK X5 / RDK X5 Module | Bayes-e | `hrt_perf_bayese:v1.24.5` |
+
+
+### 目录结构
+
+```
+bpu_model_perf_images/
+├── Dockerfile.nashem_perf          # RDK S100 (Nash-e/Nash-m) Dockerfile
+├── Dockerfile.bayese_perf          # RDK X5 (Bayes-e) Dockerfile
+├── docker_build_nashem.sh          # RDK S100 镜像构建脚本
+├── docker_build_bayese.sh          # RDK X5 镜像构建脚本
+├── docker_run_nashem_perf.sh       # RDK S100 自动化 perf 运行脚本
+├── docker_run_bayese_perf.sh       # RDK X5 自动化 perf 运行脚本
+├── docker_test_nashem.sh           # RDK S100 交互式进入容器脚本
+├── docker_test_bayese.sh           # RDK X5 交互式进入容器脚本
+├── workspace/
+│   ├── perf.py                     # 性能测试主脚本（两平台共用）
+│   └── entrypoint.sh               # 容器入口脚本（两平台共用）
+├── example_fs/                     # RDK S100 示例输入输出
+│   ├── input/
+│   │   ├── task.json               # 输入协议文件
+│   │   └── *.hbm                   # 模型文件（放在此处）
+│   └── output/
+│       └── result.json             # 运行后自动生成
+└── example_fs_bayese/              # RDK X5 示例输入输出
+    ├── input/
+    │   ├── task.json               # 输入协议文件
+    │   └── *.hbm / *.bin           # 模型文件（放在此处）
+    └── output/
+        └── result.json             # 运行后自动生成
+```
+
+
+### RDK S100 系列使用方法（Nash-e / Nash-m）
+
+**第一步：构建镜像**（只需构建一次，之后可直接交付镜像）
+
+```bash
+cd bpu_model_perf_images
+bash docker_build_nashem.sh
+```
+
+构建脚本会自动从板子上找到 `hrt_model_exec`（`which hrt_model_exec`）并打包进镜像，无需手动操作。
+
+**第二步：准备输入**
+
+将模型文件（`.hbm`）放入 `example_fs/input/`，并编辑 `example_fs/input/task.json`：
+
+```json
+{
+  "model_relative_path": "your_model.hbm",
+  "frame_count": 200
+}
+```
+
+**第三步：运行**
+
+```bash
+bash docker_run_nashem_perf.sh
+```
+
+运行脚本将 `example_fs/input` 挂载为容器内 `/workspace/input`，`example_fs/output` 挂载为 `/workspace/output`。
+
+**预期输出**
+
+```text
+============================================================
+task.json content:
+{
+  "model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
+  "frame_count": 200
+}
+============================================================
+[OK] 1 model(s) validated, frame_count=200
+
+[Benchmarking] yolo11n_detect_nashe_640x640_nv12.hbm (/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm)
+  thread_num=1 ... FPS=625.20  avg_latency=1.538ms
+  thread_num=2 ... FPS=1133.43  avg_latency=1.703ms
+  thread_num=3 ... FPS=1171.23  avg_latency=2.482ms
+  thread_num=4 ... FPS=1167.98  avg_latency=3.322ms
+
+================================================================================
+---------------------------------------+-----------+------------------+------------+
+| Model                                 | Threads   | Avg Latency(ms)  | FPS        |
+---------------------------------------+-----------+------------------+------------+
+| yolo11n_detect_nashe_640x640_nv12.hbm | 1         | 1.538            | 625.20     |
+|                                       | 2         | 1.703            | 1133.43    |
+|                                       | 3         | 2.482            | 1171.23    |
+|                                       | 4         | 3.322            | 1167.98    |
+---------------------------------------+-----------+------------------+------------+
+
+Results saved to: /workspace/output/result.json
+```
+
+
+### RDK X5 系列使用方法（Bayes-e）
+
+**第一步：构建镜像**
+
+```bash
+cd bpu_model_perf_images
+bash docker_build_bayese.sh
+```
+
+**第二步：准备输入**
+
+将模型文件（`.hbm` 或 `.bin`）放入 `example_fs_bayese/input/`，并编辑 `example_fs_bayese/input/task.json`：
+
+```json
+{
+  "model_relative_path": "your_model.hbm",
+  "frame_count": 200
+}
+```
+
+**第三步：运行**
+
+```bash
+bash docker_run_bayese_perf.sh
+```
+
+
+### 交互式进入容器
+
+镜像内置了 `hrt_model_exec` 等 BPU 工具，可以交互式进入容器手动调试模型。
+
+**RDK S100（Nash-e / Nash-m）**
+
+```bash
+cd bpu_model_perf_images
+bash docker_test_nashem.sh
+```
+
+**RDK X5（Bayes-e）**
+
+```bash
+cd bpu_model_perf_images
+bash docker_test_bayese.sh
+```
+
+进入容器后可直接使用 `hrt_model_exec`，例如：
+
+```bash
+# 查看帮助
+hrt_model_exec --help
+
+# 手动跑 perf
+hrt_model_exec perf --model_file /workspace/input/xxx.hbm --thread_num 2 --frame_count 200
+
+# 手动跑推理
+hrt_model_exec infer --model_file /workspace/input/xxx.hbm --input_file input.jpg
+```
+
+> 容器以 `--privileged` 模式启动，宿主机的 `/usr/hobot`、`/opt/hobot` 等目录均已挂载，库文件与板子上完全一致。退出容器后自动删除（`--rm`）。
+
+
+### task.json 协议说明
+
+支持两种格式：
+
+**单模型**
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `model_relative_path` | string | 是 | 模型文件名，相对于 input 目录，后缀必须为 `.hbm` 或 `.bin` |
+| `frame_count` | int | 否 | 测试帧数，默认 200 |
+
+```json
+{
+  "model_relative_path": "model.hbm",
+  "frame_count": 200
+}
+```
+
+**多模型**
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `models` | list | 是 | 模型列表 |
+| `models[].path` | string | 是 | 模型文件名，相对于 input 目录 |
+| `models[].name` | string | 否 | 显示名称，默认取文件名 |
+| `frame_count` | int | 否 | 测试帧数，默认 200 |
+
+```json
+{
+  "models": [
+    {"name": "model_a", "path": "model_a.hbm"},
+    {"name": "model_b", "path": "model_b.bin"}
+  ],
+  "frame_count": 200
+}
+```
+
+> 注意：`model_relative_path` 和 `models` 不能同时出现。
+
+
+### 输出结果说明
+
+结果保存在 `output/result.json`，结构如下：
+
+```json
+[
+  {
+    "model_name": "model.hbm",
+    "model_path": "/workspace/input/model.hbm",
+    "perf_results": [
+      {
+        "thread_num": 1,
+        "frame_count": 200,
+        "run_time_ms": 320.18,
+        "total_latency_ms": 307.53,
+        "avg_latency_ms": 1.538,
+        "fps": 625.197,
+        "raw_output": "...",
+        "returncode": 0
+      }
+    ]
+  }
+]
+```
--- a/bpu_model_perf_images/Dockerfile.bayese_perf
+++ b/bpu_model_perf_images/Dockerfile.bayese_perf
@ -0,0 +1,65 @@
+FROM ubuntu:22.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+# ============================================================
+# 基础工具
+# SSH Server 安装和配置
+# ============================================================
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    python3-venv \
+    git \
+    wget \
+    curl \
+    vim \
+    sshpass \
+    i2c-tools \
+    software-properties-common \
+    gnupg2 \
+    lsb-release \
+    openssh-server \
+    locales \
+    && mkdir -p /var/run/sshd \
+    && rm -rf /var/lib/apt/lists/*
+
+# 配置中文 locale（防止终端中文乱码）
+RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
+ENV LANG=zh_CN.UTF-8
+ENV LC_ALL=zh_CN.UTF-8
+
+# SSH 安全配置
+RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
+    && sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
+    && echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
+    && echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
+    && echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
+    && echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
+    && echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
+
+# 生成 SSH Host Key（容器首次启动时会自动生成，这里预生成加速启动）
+RUN ssh-keygen -A
+
+# 设置 root 密码
+RUN echo 'root:root' | chpasswd
+
+# ============================================================
+# BPU Model Perf 工具
+# ============================================================
+
+# 拷贝 hrt_model_exec（从宿主机板子上的路径）
+COPY hrt_model_exec /usr/local/bin/hrt_model_exec
+RUN chmod +x /usr/local/bin/hrt_model_exec
+
+# 拷贝 perf 脚本和 entrypoint
+COPY workspace/perf.py /workspace/perf/perf.py
+COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
+RUN chmod +x /workspace/perf/entrypoint.sh
+
+# 工作目录和挂载点
+RUN mkdir -p /workspace/input /workspace/output
+
+WORKDIR /workspace/perf
+
+ENTRYPOINT ["/workspace/perf/entrypoint.sh"]
--- a/bpu_model_perf_images/Dockerfile.nashem_perf
+++ b/bpu_model_perf_images/Dockerfile.nashem_perf
@ -0,0 +1,66 @@
+FROM ubuntu:22.04
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+# ============================================================
+# 基础工具
+# SSH Server 安装和配置
+# ============================================================
+RUN apt-get update && apt-get install -y \
+    python3 \
+    python3-pip \
+    python3-venv \
+    git \
+    wget \
+    curl \
+    vim \
+    sshpass \
+    i2c-tools \
+    software-properties-common \
+    gnupg2 \
+    lsb-release \
+    openssh-server \
+    locales \
+    && mkdir -p /var/run/sshd \
+    && rm -rf /var/lib/apt/lists/*
+
+# 配置中文 locale（防止终端中文乱码）
+RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
+ENV LANG=zh_CN.UTF-8
+ENV LC_ALL=zh_CN.UTF-8
+
+# SSH 安全配置
+RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
+    && sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
+    && echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
+    && echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
+    && echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
+    && echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
+    && echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
+
+# 生成 SSH Host Key（容器首次启动时会自动生成，这里预生成加速启动）
+RUN ssh-keygen -A
+
+# 设置 root 密码
+RUN echo 'root:root' | chpasswd
+
+# ============================================================
+# BPU Model Perf 工具
+# ============================================================
+
+# 拷贝 hrt_model_exec（从宿主机板子上的路径）
+COPY hrt_model_exec /usr/local/bin/hrt_model_exec
+RUN chmod +x /usr/local/bin/hrt_model_exec
+
+# 拷贝 perf 脚本和 entrypoint
+COPY workspace/perf.py /workspace/perf/perf.py
+COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
+RUN chmod +x /workspace/perf/entrypoint.sh
+
+# 工作目录和挂载点
+RUN mkdir -p /workspace/input /workspace/output
+
+WORKDIR /workspace/perf
+
+ENTRYPOINT ["/workspace/perf/entrypoint.sh"]
+
--- a/bpu_model_perf_images/docker_build_bayese.sh
+++ b/bpu_model_perf_images/docker_build_bayese.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+# Copy hrt_model_exec into build context
+HRT_BIN="$(which hrt_model_exec)"
+echo "Copying hrt_model_exec from $HRT_BIN"
+cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
+
+docker build -f Dockerfile.bayese_perf -t hrt_perf_bayese:v1.24.5 "$SCRIPT_DIR"
+
+# Clean up copied binary
+rm -f "$SCRIPT_DIR/hrt_model_exec"
--- a/bpu_model_perf_images/docker_build_nashem.sh
+++ b/bpu_model_perf_images/docker_build_nashem.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+# Copy hrt_model_exec into build context
+HRT_BIN="$(which hrt_model_exec)"
+echo "Copying hrt_model_exec from $HRT_BIN"
+cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
+
+docker build -f Dockerfile.nashem_perf -t hrt_perf_nashem:v3.7.3 "$SCRIPT_DIR"
+
+# Clean up copied binary
+rm -f "$SCRIPT_DIR/hrt_model_exec"
--- a/bpu_model_perf_images/docker_run_bayese_perf.sh
+++ b/bpu_model_perf_images/docker_run_bayese_perf.sh
@ -0,0 +1,16 @@
+#!/bin/bash
+# Run the bayese perf container
+# Input:  /workspace/input/task.json  (mounted from host)
+# Output: /workspace/output/result.json (mounted from host)
+
+docker run --rm \
+  --name hrt-perf-bayese \
+  --privileged \
+  -v /opt/hobot:/opt/hobot:ro \
+  -v /usr/hobot:/usr/hobot:ro \
+  -v /opt/tros:/opt/tros:ro \
+  -v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
+  -e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
+  -v $(pwd)/example_fs/input:/workspace/input:ro \
+  -v $(pwd)/example_fs/output:/workspace/output \
+  hrt_perf_bayese:v1.24.5
--- a/bpu_model_perf_images/docker_run_nashem_perf.sh
+++ b/bpu_model_perf_images/docker_run_nashem_perf.sh
@ -0,0 +1,16 @@
+#!/bin/bash
+# Run the nashem perf container
+# Input:  /workspace/input/task.json  (mounted from host)
+# Output: /workspace/output/result.json (mounted from host)
+
+docker run --rm \
+  --name hrt-perf-nashem \
+  --privileged \
+  -v /opt/hobot:/opt/hobot:ro \
+  -v /usr/hobot:/usr/hobot:ro \
+  -v /opt/tros:/opt/tros:ro \
+  -v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
+  -e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
+  -v $(pwd)/example_fs/input:/workspace/input:ro \
+  -v $(pwd)/example_fs/output:/workspace/output \
+  hrt_perf_nashem:v3.7.3
--- a/bpu_model_perf_images/docker_test_bayese.sh
+++ b/bpu_model_perf_images/docker_test_bayese.sh
@ -0,0 +1,15 @@
+#!/bin/bash
+# 交互式启动 BaYeSe perf 容器，entrypoint 为 /bin/bash
+# 用于在容器内交互式使用 BPU（hrt_model_exec 等工具）
+
+docker run -it --rm \
+  --name hrt-perf-bayese-interactive \
+  --privileged \
+  -v /opt/hobot:/opt/hobot:ro \
+  -v /usr/hobot:/usr/hobot:ro \
+  -v /opt/tros:/opt/tros:ro \
+  -v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
+  -e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
+  -e ROS_DOMAIN_ID=0 \
+  --entrypoint /bin/bash \
+  hrt_perf_bayese:v1.24.5
--- a/bpu_model_perf_images/docker_test_nashem.sh
+++ b/bpu_model_perf_images/docker_test_nashem.sh
@ -0,0 +1,15 @@
+#!/bin/bash
+# 交互式启动 NashEM perf 容器，entrypoint 为 /bin/bash
+# 用于在容器内交互式使用 BPU（hrt_model_exec 等工具）
+
+docker run -it --rm \
+  --name hrt-perf-nashem-interactive \
+  --privileged \
+  -v /opt/hobot:/opt/hobot:ro \
+  -v /usr/hobot:/usr/hobot:ro \
+  -v /opt/tros:/opt/tros:ro \
+  -v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
+  -e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
+  -e ROS_DOMAIN_ID=0 \
+  --entrypoint /bin/bash \
+  hrt_perf_nashem:v3.7.3
--- a/bpu_model_perf_images/example_fs/input/task.json
+++ b/bpu_model_perf_images/example_fs/input/task.json
@ -0,0 +1,4 @@
+{
+  "model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
+  "frame_count": 20
+}
--- a/bpu_model_perf_images/example_fs/input/yolo11n_detect_nashe_640x640_nv12.hbm
+++ b/bpu_model_perf_images/example_fs/input/yolo11n_detect_nashe_640x640_nv12.hbm
--- a/bpu_model_perf_images/example_fs/output/result.json
+++ b/bpu_model_perf_images/example_fs/output/result.json
@ -0,0 +1,48 @@
+[
+  {
+    "model_name": "yolo11n_detect_nashe_640x640_nv12.hbm",
+    "model_path": "/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm",
+    "perf_results": [
+      {
+        "thread_num": 1,
+        "frame_count": 20,
+        "run_time_ms": 32.467,
+        "total_latency_ms": 30.841,
+        "avg_latency_ms": 1.542,
+        "fps": 620.79,
+        "raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 1 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data.  \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473143438048][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 352.489ms.\n\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n  Thread number is: 1\n  Frame count   is: 20\n  Program run time: 32.467 ms\nPerf result:\n  Frame totally latency is: 30.841 ms\n  Average    latency    is: 1.542 ms\n  Frame      rate       is: 620.790 FPS\n",
+        "returncode": 0
+      },
+      {
+        "thread_num": 2,
+        "frame_count": 20,
+        "run_time_ms": 19.705,
+        "total_latency_ms": 36.596,
+        "avg_latency_ms": 1.837,
+        "fps": 1016.369,
+        "raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 2 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data.  \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472894728928][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 356.068ms.\n\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n  Thread number is: 2\n  Frame count   is: 20\n  Program run time: 19.705 ms\nPerf result:\n  Frame totally latency is: 36.596 ms\n  Average    latency    is: 1.837 ms\n  Frame      rate       is: 1016.369 FPS\n",
+        "returncode": 0
+      },
+      {
+        "thread_num": 3,
+        "frame_count": 20,
+        "run_time_ms": 18.084,
+        "total_latency_ms": 49.101,
+        "avg_latency_ms": 2.456,
+        "fps": 1111.797,
+        "raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 3 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data.  \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473576172256][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.722ms.\n\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n  Thread number is: 3\n  Frame count   is: 20\n  Program run time: 18.084 ms\nPerf result:\n  Frame totally latency is: 49.101 ms\n  Average    latency    is: 2.456 ms\n  Frame      rate       is: 1111.797 FPS\n",
+        "returncode": 0
+      },
+      {
+        "thread_num": 4,
+        "frame_count": 20,
+        "run_time_ms": 18.174,
+        "total_latency_ms": 64.363,
+        "avg_latency_ms": 3.217,
+        "fps": 1092.826,
+        "raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 4 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data.  \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472884636384][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.186ms.\n\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n  Thread number is: 4\n  Frame count   is: 20\n  Program run time: 18.174 ms\nPerf result:\n  Frame totally latency is: 64.363 ms\n  Average    latency    is: 3.217 ms\n  Frame      rate       is: 1092.826 FPS\n",
+        "returncode": 0
+      }
+    ]
+  }
+]
--- a/bpu_model_perf_images/example_fs_bayese/input/task.json
+++ b/bpu_model_perf_images/example_fs_bayese/input/task.json
@ -0,0 +1,4 @@
+{
+  "model_relative_path": "your_bayese_model.hbm",
+  "frame_count": 200
+}
--- a/bpu_model_perf_images/workspace/entrypoint.sh
+++ b/bpu_model_perf_images/workspace/entrypoint.sh
@ -0,0 +1,20 @@
+#!/bin/bash
+set -e
+
+INPUT_DIR="/workspace/input"
+OUTPUT_DIR="/workspace/output"
+INPUT_JSON="${INPUT_DIR}/task.json"
+OUTPUT_JSON="${OUTPUT_DIR}/result.json"
+
+# Allow overrides via env vars
+INPUT_JSON="${PERF_INPUT:-$INPUT_JSON}"
+OUTPUT_JSON="${PERF_OUTPUT:-$OUTPUT_JSON}"
+
+if [ ! -f "$INPUT_JSON" ]; then
+    echo "ERROR: input file not found: $INPUT_JSON"
+    exit 1
+fi
+
+mkdir -p "$(dirname "$OUTPUT_JSON")"
+
+exec python3 /workspace/perf/perf.py --input "$INPUT_JSON" --output "$OUTPUT_JSON"
--- a/bpu_model_perf_images/workspace/perf.py
+++ b/bpu_model_perf_images/workspace/perf.py
@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+"""
+BPU Model Performance Benchmark Tool
+Runs hrt_model_exec perf for each model at thread counts 1,2,3,4
+"""
+
+import argparse
+import json
+import re
+import subprocess
+import sys
+from pathlib import Path
+
+HRT_MODEL_EXEC = "hrt_model_exec"
+THREAD_COUNTS = [1, 2, 3, 4]
+
+
+def run_perf(model_path: str, thread_num: int, frame_count: int = 200) -> dict:
+    """Run hrt_model_exec perf and return parsed results."""
+    cmd = [
+        HRT_MODEL_EXEC, "perf",
+        "--model_file", model_path,
+        "--thread_num", str(thread_num),
+        "--frame_count", str(frame_count),
+    ]
+    result = subprocess.run(cmd, capture_output=True, text=True)
+    output = result.stdout + result.stderr
+
+    perf = {
+        "thread_num": thread_num,
+        "frame_count": frame_count,
+        "run_time_ms": None,
+        "total_latency_ms": None,
+        "avg_latency_ms": None,
+        "fps": None,
+        "raw_output": output,
+        "returncode": result.returncode,
+    }
+
+    m = re.search(r"Program run time:\s*([\d.]+)\s*ms", output)
+    if m:
+        perf["run_time_ms"] = float(m.group(1))
+
+    m = re.search(r"Frame totally latency is:\s*([\d.]+)\s*ms", output)
+    if m:
+        perf["total_latency_ms"] = float(m.group(1))
+
+    m = re.search(r"Average\s+latency\s+is:\s*([\d.]+)\s*ms", output)
+    if m:
+        perf["avg_latency_ms"] = float(m.group(1))
+
+    m = re.search(r"Frame\s+rate\s+is:\s*([\d.]+)\s*FPS", output)
+    if m:
+        perf["fps"] = float(m.group(1))
+
+    return perf
+
+
+def print_table(results: list):
+    """Print results as a human-readable table."""
+    # 动态计算 model 列宽
+    max_name_len = max(len(e["model_name"]) for e in results)
+    col_widths = [max(max_name_len, 10), 9, 16, 10]
+    headers = ["Model", "Threads", "Avg Latency(ms)", "FPS"]
+
+    sep = "+" + "+".join("-" * (w + 2) for w in col_widths) + "+"
+    header_row = "|" + "|".join(
+        f" {h:<{w}} " for h, w in zip(headers, col_widths)
+    ) + "|"
+
+    for entry in results:
+        name = entry["model_name"]
+        # 每个模型单独打印表头
+        print(sep)
+        print(header_row)
+        print(sep)
+        for p in entry["perf_results"]:
+            avg = f"{p['avg_latency_ms']:.3f}" if p["avg_latency_ms"] is not None else "N/A"
+            fps = f"{p['fps']:.2f}" if p["fps"] is not None else "N/A"
+            row = [name, str(p["thread_num"]), avg, fps]
+            print("|" + "|".join(
+                f" {v:<{w}} " for v, w in zip(row, col_widths)
+            ) + "|")
+            name = ""
+        print(sep)
+
+
+def validate_config(config: dict, input_dir: Path) -> list:
+    """Print and validate task.json, return normalized model list. Exit on error."""
+    print("=" * 60)
+    print("task.json content:")
+    print(json.dumps(config, indent=2, ensure_ascii=False))
+    print("=" * 60)
+
+    errors = []
+
+    # --- frame_count ---
+    if "frame_count" in config:
+        if not isinstance(config["frame_count"], int) or config["frame_count"] <= 0:
+            errors.append("  [frame_count] must be a positive integer")
+
+    # --- 判断格式 ---
+    has_single = "model_relative_path" in config
+    has_multi  = "models" in config
+
+    if not has_single and not has_multi:
+        errors.append("  missing required field: 'model_relative_path' or 'models'")
+        for e in errors:
+            print(f"[ERROR] {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if has_single and has_multi:
+        errors.append("  ambiguous: both 'model_relative_path' and 'models' are present, use one")
+
+    models = []
+
+    if has_single:
+        rel = config["model_relative_path"]
+        if not isinstance(rel, str) or not rel.strip():
+            errors.append("  [model_relative_path] must be a non-empty string")
+        elif not rel.endswith((".hbm", ".bin")):
+            errors.append(f"  [model_relative_path] unsupported extension: '{rel}' (expected .hbm or .bin)")
+        else:
+            models = [{"name": Path(rel).name, "path": rel}]
+
+    if has_multi:
+        if not isinstance(config["models"], list) or len(config["models"]) == 0:
+            errors.append("  [models] must be a non-empty list")
+        else:
+            for i, m in enumerate(config["models"]):
+                prefix = f"  [models[{i}]]"
+                if not isinstance(m, dict):
+                    errors.append(f"{prefix} each entry must be an object")
+                    continue
+                if "path" not in m:
+                    errors.append(f"{prefix} missing required field 'path'")
+                elif not isinstance(m["path"], str) or not m["path"].strip():
+                    errors.append(f"{prefix} 'path' must be a non-empty string")
+                elif not m["path"].endswith((".hbm", ".bin")):
+                    errors.append(f"{prefix} unsupported extension: '{m['path']}' (expected .hbm or .bin)")
+                else:
+                    models.append({"name": m.get("name", Path(m["path"]).name), "path": m["path"]})
+
+    if errors:
+        for e in errors:
+            print(f"[ERROR] {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # --- 文件存在性检查 ---
+    missing = []
+    for m in models:
+        full = input_dir / m["path"]
+        if not full.exists():
+            missing.append(f"  model file not found: {full}")
+    if missing:
+        for e in missing:
+            print(f"[ERROR] {e}", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"[OK] {len(models)} model(s) validated, frame_count={config.get('frame_count', 200)}\n")
+    return models
+
+
+def main():
+    parser = argparse.ArgumentParser(description="BPU model perf benchmark")
+    parser.add_argument("--input", required=True, help="Input JSON file")
+    parser.add_argument("--output", required=True, help="Output JSON file")
+    args = parser.parse_args()
+
+    if not Path(args.input).exists():
+        print(f"[ERROR] input file not found: {args.input}", file=sys.stderr)
+        sys.exit(1)
+
+    with open(args.input) as f:
+        config = json.load(f)
+
+    input_dir = Path(args.input).parent
+    frame_count = config.get("frame_count", 200)
+    models = validate_config(config, input_dir)
+
+    if not models:
+        print("[ERROR] no models specified in input JSON", file=sys.stderr)
+        sys.exit(1)
+
+    output_results = []
+
+    for model in models:
+        rel_path = model["path"]
+        model_path = str(input_dir / rel_path)
+        model_name = model.get("name", Path(rel_path).name)
+        print(f"\n[Benchmarking] {model_name} ({model_path})")
+
+        perf_results = []
+        for t in THREAD_COUNTS:
+            print(f"  thread_num={t} ...", end=" ", flush=True)
+            p = run_perf(model_path, t, frame_count)
+            perf_results.append(p)
+            if p["fps"] is not None:
+                print(f"FPS={p['fps']:.2f}  avg_latency={p['avg_latency_ms']:.3f}ms")
+            else:
+                print("FAILED (check raw_output in result JSON)")
+
+        output_results.append({
+            "model_name": model_name,
+            "model_path": model_path,
+            "perf_results": perf_results,
+        })
+
+    print("\n" + "=" * 80)
+    print_table(output_results)
+
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_path, "w") as f:
+        json.dump(output_results, f, indent=2)
+
+    print(f"\nResults saved to: {args.output}")
+
+
+if __name__ == "__main__":
+    main()
--- a/install_docker/install_docker.sh
+++ b/install_docker/install_docker.sh
--- a/install_docker/test_docker.sh
+++ b/install_docker/test_docker.sh