update perf docker

This commit is contained in:
Cauchy WuChao 2026-03-18 17:23:23 +08:00
parent 03e2d3520d
commit a7bda96863
17 changed files with 775 additions and 189 deletions

444
README.md
View File

@ -1,29 +1,41 @@
- [Docker一键安装脚本](#docker一键安装脚本)
- [Docker 一键安装脚本](#docker-一键安装脚本)
- [支持平台](#支持平台)
- [安装](#安装)
- [测试](#测试)
- [BPU 模型性能测试工具](#bpu-模型性能测试工具)
- [工具说明](#工具说明)
- [目录结构](#目录结构)
- [RDK S100 系列使用方法Nash-e / Nash-m](#rdk-s100-系列使用方法nash-e--nash-m)
- [RDK X5 系列使用方法Bayes-e](#rdk-x5-系列使用方法bayes-e)
- [交互式进入容器](#交互式进入容器)
- [task.json 协议说明](#taskjson-协议说明)
- [输出结果说明](#输出结果说明)
---
## Docker 一键安装脚本
### 支持平台
1. RDK X5 / RDK X5 Module (RDK OS版本 >= 3.3.3, MiniBoot版本 >= Sep-03-2025)
2. RDK S100 / RDK S100P (RDK OS版本 >= 4.0.4-Beta, Miniboot版本 >= 4.0.4-20251015222314)
| 硬件平台 | BPU 架构 | RDK OS 版本要求 | MiniBoot 版本要求 |
|----------|----------|-----------------|-------------------|
| RDK S100 / RDK S100P | Nash-e / Nash-m | >= 4.0.4-Beta | >= 4.0.4-20251015222314 |
| RDK X5 / RDK X5 Module | Bayes-e | >= 3.3.3 | >= Sep-03-2025 |
### 安装
脚本路径`RDK_Docker_Tools/install_docker.sh`, 该脚本会在RDK板端一键安装docker软件, 配置好网络相关配置.
脚本路径 `RDK_Docker_Tools/install_docker/install_docker.sh`,该脚本会在 RDK 板端一键安装 Docker 软件,配置好网络相关配置。
1. 安装到使用过程无需重启.
2. 安装过程中eth0网络配置不会做做任何修改, ssh不会挂掉.
3. 请使用root用户安装.
4. 请确保板卡能正常访问互联网, `/`目录有2GB可用空间.
1. 安装到使用过程无需重启
2. 安装过程中 eth0 网络配置不会做任何修改ssh 不会挂掉。
3. 请使用 root 用户安装。
4. 请确保板卡能正常访问互联网`/` 目录有 2GB 可用空间。
```bash
bash install_docker.sh
bash install_docker/install_docker.sh
```
预期输出
@ -42,131 +54,23 @@ bash install_docker.sh
[INFO] 注意: 本脚本不会修改 eth0 的任何配置
[INFO] 检查 Docker 是否已安装...
[INFO] 检查依赖命令...
[WARN] 以下命令缺失: curl, 尝试安装...
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libcurl4 libcurl4-openssl-dev
Suggested packages:
libcurl4-doc libidn11-dev librtmp-dev libssh2-1-dev
The following NEW packages will be installed:
curl
The following packages will be upgraded:
libcurl4 libcurl4-openssl-dev
2 upgraded, 1 newly installed, 0 to remove and 414 not upgraded.
Need to get 190 kB/867 kB of archives.
After this operation, 439 kB of additional disk space will be used.
Get:1 http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports jammy-updates/main arm64 curl arm64 7.81.0-1ubuntu1.23 [190 kB]
Fetched 190 kB in 0s (475 kB/s)
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
(Reading database ... 219577 files and directories currently installed.)
Preparing to unpack .../libcurl4-openssl-dev_7.81.0-1ubuntu1.23_arm64.deb ...
Unpacking libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
Preparing to unpack .../libcurl4_7.81.0-1ubuntu1.23_arm64.deb ...
Unpacking libcurl4:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
Selecting previously unselected package curl.
Preparing to unpack .../curl_7.81.0-1ubuntu1.23_arm64.deb ...
Unpacking curl (7.81.0-1ubuntu1.23) ...
Setting up libcurl4:arm64 (7.81.0-1ubuntu1.23) ...
Setting up curl (7.81.0-1ubuntu1.23) ...
Setting up libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
[OK] 依赖命令检查完毕 ✓
[INFO] 检查网络连通性...
[OK] 可访问 Ubuntu ports 源 (ports.ubuntu.com) ✓
[INFO] 更新 apt 软件包缓存...
[OK] apt 缓存更新完毕 ✓
[INFO] 安装必要依赖包...
[INFO] 安装缺失依赖: apt-transport-https
...
[OK] 依赖包就绪 ✓
[INFO] 开始安装 Docker (docker.io)...
...
Adding group `docker' (GID 135) ...
Done.
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /lib/systemd/system/docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket.
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
invoke-rc.d: initscript docker, action "start" failed.
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2026-03-18 11:12:24 CST; 7ms ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 231291 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 231291 (code=exited, status=1/FAILURE)
Setting up dnsmasq-base (2.90-0ubuntu0.22.04.1) ...
Setting up docker-buildx (0.21.3-0ubuntu1~22.04.1) ...
Setting up ubuntu-fan (0.12.16) ...
Created symlink /etc/systemd/system/multi-user.target.wants/ubuntu-fan.service → /lib/systemd/system/ubuntu-fan.service.
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for dbus (1.12.20-2ubuntu4.1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
[OK] Docker 安装完毕 ✓
[INFO] 检查 iptables 后端兼容性...
update-alternatives: using /usr/sbin/iptables-legacy to provide /usr/sbin/iptables (iptables) in manual mode
update-alternatives: using /usr/sbin/ip6tables-legacy to provide /usr/sbin/ip6tables (ip6tables) in manual mode
[OK] 已切换到 iptables-legacy 后端 ✓
[INFO] 配置 Docker 镜像加速源...
[WARN] 内核缺少 iptable_raw 模块, 将启用 allow-direct-routing 绕过 raw 表限制
[OK] 镜像加速配置完毕 ✓
[INFO] 启动 Docker 服务...
[OK] Docker 服务已通过 systemd 启动 ✓
[INFO] 检查 Docker Hub 及镜像源 DNS 解析...
[OK] DNS hosts 修复完毕 ✓
[INFO] 验证 Docker 安装...
[OK] Docker 版本: Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1
[OK] dockerd 进程运行中 ✓
[OK] Docker socket 存在: /var/run/docker.sock ✓
[OK] docker info 执行成功 ✓
[INFO] 验证核心命令可用性...
[OK] docker pull --help ✓
[OK] docker push --help ✓
[OK] docker commit --help ✓
[OK] docker export --help ✓
[OK] docker load --help ✓
[OK] docker images ✓
[OK] docker ps ✓
...
============================================================
[OK] Docker 安装完成!
============================================================
常用命令速查:
docker pull <镜像> # 拉取镜像
docker push <镜像> # 推送镜像
docker commit <容器> <镜像> # 提交容器为镜像
docker export <容器> -o x.tar # 导出容器
docker load -i <文件.tar> # 加载镜像
docker images # 查看本地镜像
docker ps -a # 查看所有容器
注意事项:
- eth0 网络配置未做任何修改
- 本次安装无需重启系统
- 镜像加速配置: /etc/docker/daemon.json
- Docker 日志: journalctl -u docker 或 /var/log/dockerd.log
```
### 测试
```bash
bash test_docker.sh
bash install_docker/test_docker.sh
```
预期输出
@ -182,99 +86,263 @@ bash test_docker.sh
------------------------------------------------------------
[INFO] [1/5] 测试 docker pull...
------------------------------------------------------------
[INFO] 尝试拉取: hello-world
Using default tag: latest
latest: Pulling from library/hello-world
198f93fd5094: Pull complete
Digest: sha256:85404b3c53951c3ff5d40de0972b1bb21fafa2e8daa235355baf44f33db9dbdd
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-world:latest
[OK] docker pull 成功 ✓
------------------------------------------------------------
[INFO] [2/5] 测试 docker run...
------------------------------------------------------------
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(arm64v8)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
[OK] docker run 成功 ✓
------------------------------------------------------------
[INFO] [3/5] 测试 docker commit...
------------------------------------------------------------
sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
[OK] docker commit 成功 -> hello-world-committed:test ✓
[OK] commit 镜像验证通过 ✓
------------------------------------------------------------
[INFO] [4/5] 测试 docker export...
------------------------------------------------------------
[OK] docker export 成功 -> /tmp/docker_test_232981/hello-world-export.tar (16K)
[OK] docker export 成功 ✓
------------------------------------------------------------
[INFO] [5/5] 测试 docker save & load...
------------------------------------------------------------
[OK] docker save 成功 -> /tmp/docker_test_232981/hello-world-load.tar (20K) ✓
Untagged: hello-world-committed:test
Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
[INFO] 已删除本地镜像 hello-world-committed:test, 准备 load...
Loaded image: hello-world-committed:test
[OK] docker save 成功 ✓
[OK] docker load 成功, 镜像已恢复 ✓
------------------------------------------------------------
[INFO] [附] docker push 说明
------------------------------------------------------------
[WARN] docker push 需要登录镜像仓库, 本脚本不自动执行
推送到 Docker Hub:
docker login
docker tag hello-world-committed:test <你的用户名>/hello-world:test
docker push <你的用户名>/hello-world:test
推送到私有仓库:
docker tag hello-world-committed:test <仓库地址>/hello-world:test
docker push <仓库地址>/hello-world:test
------------------------------------------------------------
[INFO] 清理测试资源...
------------------------------------------------------------
docker_test_232981
[OK] 容器已清理 ✓
Untagged: hello-world-committed:test
Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
[OK] commit 镜像已清理 ✓
[OK] 临时文件已清理 ✓
============================================================
[OK] 全部测试通过!
============================================================
测试结果汇总:
✓ docker pull - 镜像拉取
✓ docker run - 容器运行
✓ docker commit - 容器提交为镜像
✓ docker export - 容器导出为 tar
✓ docker save - 镜像保存为 tar
✓ docker load - 从 tar 加载镜像
- docker push - 需手动配置仓库后执行
```
---
## BPU 模型性能测试工具
### 工具说明
基于 `hrt_model_exec perf` 命令,对指定的 `.hbm``.bin` 模型文件,分别在线程数 1、2、3、4 下进行性能测试,输出每个线程数对应的平均延迟和 FPS并将结果保存为 JSON 文件。
| 硬件平台 | BPU 架构 | 镜像名称 |
|----------|----------|----------|
| RDK S100 / RDK S100P | Nash-e / Nash-m | `hrt_perf_nashem:v3.7.3` |
| RDK X5 / RDK X5 Module | Bayes-e | `hrt_perf_bayese:v1.24.5` |
### 目录结构
```
bpu_model_perf_images/
├── Dockerfile.nashem_perf # RDK S100 (Nash-e/Nash-m) Dockerfile
├── Dockerfile.bayese_perf # RDK X5 (Bayes-e) Dockerfile
├── docker_build_nashem.sh # RDK S100 镜像构建脚本
├── docker_build_bayese.sh # RDK X5 镜像构建脚本
├── docker_run_nashem_perf.sh # RDK S100 自动化 perf 运行脚本
├── docker_run_bayese_perf.sh # RDK X5 自动化 perf 运行脚本
├── docker_test_nashem.sh # RDK S100 交互式进入容器脚本
├── docker_test_bayese.sh # RDK X5 交互式进入容器脚本
├── workspace/
│ ├── perf.py # 性能测试主脚本(两平台共用)
│ └── entrypoint.sh # 容器入口脚本(两平台共用)
├── example_fs/ # RDK S100 示例输入输出
│ ├── input/
│ │ ├── task.json # 输入协议文件
│ │ └── *.hbm # 模型文件(放在此处)
│ └── output/
│ └── result.json # 运行后自动生成
└── example_fs_bayese/ # RDK X5 示例输入输出
├── input/
│ ├── task.json # 输入协议文件
│ └── *.hbm / *.bin # 模型文件(放在此处)
└── output/
└── result.json # 运行后自动生成
```
### RDK S100 系列使用方法Nash-e / Nash-m
**第一步:构建镜像**(只需构建一次,之后可直接交付镜像)
```bash
cd bpu_model_perf_images
bash docker_build_nashem.sh
```
构建脚本会自动从板子上找到 `hrt_model_exec``which hrt_model_exec`)并打包进镜像,无需手动操作。
**第二步:准备输入**
将模型文件(`.hbm`)放入 `example_fs/input/`,并编辑 `example_fs/input/task.json`
```json
{
"model_relative_path": "your_model.hbm",
"frame_count": 200
}
```
**第三步:运行**
```bash
bash docker_run_nashem_perf.sh
```
运行脚本将 `example_fs/input` 挂载为容器内 `/workspace/input``example_fs/output` 挂载为 `/workspace/output`
**预期输出**
```text
============================================================
task.json content:
{
"model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
"frame_count": 200
}
============================================================
[OK] 1 model(s) validated, frame_count=200
[Benchmarking] yolo11n_detect_nashe_640x640_nv12.hbm (/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm)
thread_num=1 ... FPS=625.20 avg_latency=1.538ms
thread_num=2 ... FPS=1133.43 avg_latency=1.703ms
thread_num=3 ... FPS=1171.23 avg_latency=2.482ms
thread_num=4 ... FPS=1167.98 avg_latency=3.322ms
================================================================================
+---------------------------------------+-----------+------------------+------------+
| Model | Threads | Avg Latency(ms) | FPS |
+---------------------------------------+-----------+------------------+------------+
| yolo11n_detect_nashe_640x640_nv12.hbm | 1 | 1.538 | 625.20 |
| | 2 | 1.703 | 1133.43 |
| | 3 | 2.482 | 1171.23 |
| | 4 | 3.322 | 1167.98 |
+---------------------------------------+-----------+------------------+------------+
Results saved to: /workspace/output/result.json
```
### RDK X5 系列使用方法Bayes-e
**第一步:构建镜像**
```bash
cd bpu_model_perf_images
bash docker_build_bayese.sh
```
**第二步:准备输入**
将模型文件(`.hbm``.bin`)放入 `example_fs_bayese/input/`,并编辑 `example_fs_bayese/input/task.json`
```json
{
"model_relative_path": "your_model.hbm",
"frame_count": 200
}
```
**第三步:运行**
```bash
bash docker_run_bayese_perf.sh
```
### 交互式进入容器
镜像内置了 `hrt_model_exec` 等 BPU 工具,可以交互式进入容器手动调试模型。
**RDK S100Nash-e / Nash-m**
```bash
cd bpu_model_perf_images
bash docker_test_nashem.sh
```
**RDK X5Bayes-e**
```bash
cd bpu_model_perf_images
bash docker_test_bayese.sh
```
进入容器后可直接使用 `hrt_model_exec`,例如:
```bash
# 查看帮助
hrt_model_exec --help
# 手动跑 perf
hrt_model_exec perf --model_file /workspace/input/xxx.hbm --thread_num 2 --frame_count 200
# 手动跑推理
hrt_model_exec infer --model_file /workspace/input/xxx.hbm --input_file input.jpg
```
> 容器以 `--privileged` 模式启动,宿主机的 `/usr/hobot``/opt/hobot` 等目录均已挂载,库文件与板子上完全一致。退出容器后自动删除(`--rm`)。
### task.json 协议说明
支持两种格式:
**单模型**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `model_relative_path` | string | 是 | 模型文件名,相对于 input 目录,后缀必须为 `.hbm``.bin` |
| `frame_count` | int | 否 | 测试帧数,默认 200 |
```json
{
"model_relative_path": "model.hbm",
"frame_count": 200
}
```
**多模型**
| 字段 | 类型 | 必填 | 说明 |
|------|------|------|------|
| `models` | list | 是 | 模型列表 |
| `models[].path` | string | 是 | 模型文件名,相对于 input 目录 |
| `models[].name` | string | 否 | 显示名称,默认取文件名 |
| `frame_count` | int | 否 | 测试帧数,默认 200 |
```json
{
"models": [
{"name": "model_a", "path": "model_a.hbm"},
{"name": "model_b", "path": "model_b.bin"}
],
"frame_count": 200
}
```
> 注意:`model_relative_path``models` 不能同时出现。
### 输出结果说明
结果保存在 `output/result.json`,结构如下:
```json
[
{
"model_name": "model.hbm",
"model_path": "/workspace/input/model.hbm",
"perf_results": [
{
"thread_num": 1,
"frame_count": 200,
"run_time_ms": 320.18,
"total_latency_ms": 307.53,
"avg_latency_ms": 1.538,
"fps": 625.197,
"raw_output": "...",
"returncode": 0
}
]
}
]
```

View File

@ -0,0 +1,65 @@
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
# ============================================================
# 基础工具
# SSH Server 安装和配置
# ============================================================
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
python3-venv \
git \
wget \
curl \
vim \
sshpass \
i2c-tools \
software-properties-common \
gnupg2 \
lsb-release \
openssh-server \
locales \
&& mkdir -p /var/run/sshd \
&& rm -rf /var/lib/apt/lists/*
# 配置中文 locale防止终端中文乱码
RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
ENV LANG=zh_CN.UTF-8
ENV LC_ALL=zh_CN.UTF-8
# SSH 安全配置
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
&& sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
&& echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
&& echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
&& echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
&& echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
&& echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
# 生成 SSH Host Key容器首次启动时会自动生成这里预生成加速启动
RUN ssh-keygen -A
# 设置 root 密码
RUN echo 'root:root' | chpasswd
# ============================================================
# BPU Model Perf 工具
# ============================================================
# 拷贝 hrt_model_exec从宿主机板子上的路径
COPY hrt_model_exec /usr/local/bin/hrt_model_exec
RUN chmod +x /usr/local/bin/hrt_model_exec
# 拷贝 perf 脚本和 entrypoint
COPY workspace/perf.py /workspace/perf/perf.py
COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
RUN chmod +x /workspace/perf/entrypoint.sh
# 工作目录和挂载点
RUN mkdir -p /workspace/input /workspace/output
WORKDIR /workspace/perf
ENTRYPOINT ["/workspace/perf/entrypoint.sh"]

View File

@ -0,0 +1,66 @@
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
# ============================================================
# 基础工具
# SSH Server 安装和配置
# ============================================================
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
python3-venv \
git \
wget \
curl \
vim \
sshpass \
i2c-tools \
software-properties-common \
gnupg2 \
lsb-release \
openssh-server \
locales \
&& mkdir -p /var/run/sshd \
&& rm -rf /var/lib/apt/lists/*
# 配置中文 locale防止终端中文乱码
RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
ENV LANG=zh_CN.UTF-8
ENV LC_ALL=zh_CN.UTF-8
# SSH 安全配置
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
&& sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
&& echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
&& echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
&& echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
&& echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
&& echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
# 生成 SSH Host Key容器首次启动时会自动生成这里预生成加速启动
RUN ssh-keygen -A
# 设置 root 密码
RUN echo 'root:root' | chpasswd
# ============================================================
# BPU Model Perf 工具
# ============================================================
# 拷贝 hrt_model_exec从宿主机板子上的路径
COPY hrt_model_exec /usr/local/bin/hrt_model_exec
RUN chmod +x /usr/local/bin/hrt_model_exec
# 拷贝 perf 脚本和 entrypoint
COPY workspace/perf.py /workspace/perf/perf.py
COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
RUN chmod +x /workspace/perf/entrypoint.sh
# 工作目录和挂载点
RUN mkdir -p /workspace/input /workspace/output
WORKDIR /workspace/perf
ENTRYPOINT ["/workspace/perf/entrypoint.sh"]

View File

@ -0,0 +1,14 @@
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# Copy hrt_model_exec into build context
HRT_BIN="$(which hrt_model_exec)"
echo "Copying hrt_model_exec from $HRT_BIN"
cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
docker build -f Dockerfile.bayese_perf -t hrt_perf_bayese:v1.24.5 "$SCRIPT_DIR"
# Clean up copied binary
rm -f "$SCRIPT_DIR/hrt_model_exec"

View File

@ -0,0 +1,14 @@
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# Copy hrt_model_exec into build context
HRT_BIN="$(which hrt_model_exec)"
echo "Copying hrt_model_exec from $HRT_BIN"
cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
docker build -f Dockerfile.nashem_perf -t hrt_perf_nashem:v3.7.3 "$SCRIPT_DIR"
# Clean up copied binary
rm -f "$SCRIPT_DIR/hrt_model_exec"

View File

@ -0,0 +1,16 @@
#!/bin/bash
# Run the bayese perf container
# Input: /workspace/input/task.json (mounted from host)
# Output: /workspace/output/result.json (mounted from host)
docker run --rm \
--name hrt-perf-bayese \
--privileged \
-v /opt/hobot:/opt/hobot:ro \
-v /usr/hobot:/usr/hobot:ro \
-v /opt/tros:/opt/tros:ro \
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
-v $(pwd)/example_fs/input:/workspace/input:ro \
-v $(pwd)/example_fs/output:/workspace/output \
hrt_perf_bayese:v1.24.5

View File

@ -0,0 +1,16 @@
#!/bin/bash
# Run the nashem perf container
# Input: /workspace/input/task.json (mounted from host)
# Output: /workspace/output/result.json (mounted from host)
docker run --rm \
--name hrt-perf-nashem \
--privileged \
-v /opt/hobot:/opt/hobot:ro \
-v /usr/hobot:/usr/hobot:ro \
-v /opt/tros:/opt/tros:ro \
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
-v $(pwd)/example_fs/input:/workspace/input:ro \
-v $(pwd)/example_fs/output:/workspace/output \
hrt_perf_nashem:v3.7.3

View File

@ -0,0 +1,15 @@
#!/bin/bash
# 交互式启动 BaYeSe perf 容器entrypoint 为 /bin/bash
# 用于在容器内交互式使用 BPUhrt_model_exec 等工具)
docker run -it --rm \
--name hrt-perf-bayese-interactive \
--privileged \
-v /opt/hobot:/opt/hobot:ro \
-v /usr/hobot:/usr/hobot:ro \
-v /opt/tros:/opt/tros:ro \
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
-e ROS_DOMAIN_ID=0 \
--entrypoint /bin/bash \
hrt_perf_bayese:v1.24.5

View File

@ -0,0 +1,15 @@
#!/bin/bash
# 交互式启动 NashEM perf 容器entrypoint 为 /bin/bash
# 用于在容器内交互式使用 BPUhrt_model_exec 等工具)
docker run -it --rm \
--name hrt-perf-nashem-interactive \
--privileged \
-v /opt/hobot:/opt/hobot:ro \
-v /usr/hobot:/usr/hobot:ro \
-v /opt/tros:/opt/tros:ro \
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
-e ROS_DOMAIN_ID=0 \
--entrypoint /bin/bash \
hrt_perf_nashem:v3.7.3

View File

@ -0,0 +1,4 @@
{
"model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
"frame_count": 20
}

View File

@ -0,0 +1,48 @@
[
{
"model_name": "yolo11n_detect_nashe_640x640_nv12.hbm",
"model_path": "/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm",
"perf_results": [
{
"thread_num": 1,
"frame_count": 20,
"run_time_ms": 32.467,
"total_latency_ms": 30.841,
"avg_latency_ms": 1.542,
"fps": 620.79,
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 1 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473143438048][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 352.489ms.\n\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 1\n Frame count is: 20\n Program run time: 32.467 ms\nPerf result:\n Frame totally latency is: 30.841 ms\n Average latency is: 1.542 ms\n Frame rate is: 620.790 FPS\n",
"returncode": 0
},
{
"thread_num": 2,
"frame_count": 20,
"run_time_ms": 19.705,
"total_latency_ms": 36.596,
"avg_latency_ms": 1.837,
"fps": 1016.369,
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 2 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472894728928][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 356.068ms.\n\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 2\n Frame count is: 20\n Program run time: 19.705 ms\nPerf result:\n Frame totally latency is: 36.596 ms\n Average latency is: 1.837 ms\n Frame rate is: 1016.369 FPS\n",
"returncode": 0
},
{
"thread_num": 3,
"frame_count": 20,
"run_time_ms": 18.084,
"total_latency_ms": 49.101,
"avg_latency_ms": 2.456,
"fps": 1111.797,
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 3 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473576172256][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.722ms.\n\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 3\n Frame count is: 20\n Program run time: 18.084 ms\nPerf result:\n Frame totally latency is: 49.101 ms\n Average latency is: 2.456 ms\n Frame rate is: 1111.797 FPS\n",
"returncode": 0
},
{
"thread_num": 4,
"frame_count": 20,
"run_time_ms": 18.174,
"total_latency_ms": 64.363,
"avg_latency_ms": 3.217,
"fps": 1092.826,
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 4 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472884636384][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.186ms.\n\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 4\n Frame count is: 20\n Program run time: 18.174 ms\nPerf result:\n Frame totally latency is: 64.363 ms\n Average latency is: 3.217 ms\n Frame rate is: 1092.826 FPS\n",
"returncode": 0
}
]
}
]

View File

@ -0,0 +1,4 @@
{
"model_relative_path": "your_bayese_model.hbm",
"frame_count": 200
}

View File

@ -0,0 +1,20 @@
#!/bin/bash
set -e
INPUT_DIR="/workspace/input"
OUTPUT_DIR="/workspace/output"
INPUT_JSON="${INPUT_DIR}/task.json"
OUTPUT_JSON="${OUTPUT_DIR}/result.json"
# Allow overrides via env vars
INPUT_JSON="${PERF_INPUT:-$INPUT_JSON}"
OUTPUT_JSON="${PERF_OUTPUT:-$OUTPUT_JSON}"
if [ ! -f "$INPUT_JSON" ]; then
echo "ERROR: input file not found: $INPUT_JSON"
exit 1
fi
mkdir -p "$(dirname "$OUTPUT_JSON")"
exec python3 /workspace/perf/perf.py --input "$INPUT_JSON" --output "$OUTPUT_JSON"

View File

@ -0,0 +1,221 @@
#!/usr/bin/env python3
"""
BPU Model Performance Benchmark Tool
Runs hrt_model_exec perf for each model at thread counts 1,2,3,4
"""
import argparse
import json
import re
import subprocess
import sys
from pathlib import Path
HRT_MODEL_EXEC = "hrt_model_exec"
THREAD_COUNTS = [1, 2, 3, 4]
def run_perf(model_path: str, thread_num: int, frame_count: int = 200) -> dict:
"""Run hrt_model_exec perf and return parsed results."""
cmd = [
HRT_MODEL_EXEC, "perf",
"--model_file", model_path,
"--thread_num", str(thread_num),
"--frame_count", str(frame_count),
]
result = subprocess.run(cmd, capture_output=True, text=True)
output = result.stdout + result.stderr
perf = {
"thread_num": thread_num,
"frame_count": frame_count,
"run_time_ms": None,
"total_latency_ms": None,
"avg_latency_ms": None,
"fps": None,
"raw_output": output,
"returncode": result.returncode,
}
m = re.search(r"Program run time:\s*([\d.]+)\s*ms", output)
if m:
perf["run_time_ms"] = float(m.group(1))
m = re.search(r"Frame totally latency is:\s*([\d.]+)\s*ms", output)
if m:
perf["total_latency_ms"] = float(m.group(1))
m = re.search(r"Average\s+latency\s+is:\s*([\d.]+)\s*ms", output)
if m:
perf["avg_latency_ms"] = float(m.group(1))
m = re.search(r"Frame\s+rate\s+is:\s*([\d.]+)\s*FPS", output)
if m:
perf["fps"] = float(m.group(1))
return perf
def print_table(results: list):
"""Print results as a human-readable table."""
# 动态计算 model 列宽
max_name_len = max(len(e["model_name"]) for e in results)
col_widths = [max(max_name_len, 10), 9, 16, 10]
headers = ["Model", "Threads", "Avg Latency(ms)", "FPS"]
sep = "+" + "+".join("-" * (w + 2) for w in col_widths) + "+"
header_row = "|" + "|".join(
f" {h:<{w}} " for h, w in zip(headers, col_widths)
) + "|"
for entry in results:
name = entry["model_name"]
# 每个模型单独打印表头
print(sep)
print(header_row)
print(sep)
for p in entry["perf_results"]:
avg = f"{p['avg_latency_ms']:.3f}" if p["avg_latency_ms"] is not None else "N/A"
fps = f"{p['fps']:.2f}" if p["fps"] is not None else "N/A"
row = [name, str(p["thread_num"]), avg, fps]
print("|" + "|".join(
f" {v:<{w}} " for v, w in zip(row, col_widths)
) + "|")
name = ""
print(sep)
def validate_config(config: dict, input_dir: Path) -> list:
"""Print and validate task.json, return normalized model list. Exit on error."""
print("=" * 60)
print("task.json content:")
print(json.dumps(config, indent=2, ensure_ascii=False))
print("=" * 60)
errors = []
# --- frame_count ---
if "frame_count" in config:
if not isinstance(config["frame_count"], int) or config["frame_count"] <= 0:
errors.append(" [frame_count] must be a positive integer")
# --- 判断格式 ---
has_single = "model_relative_path" in config
has_multi = "models" in config
if not has_single and not has_multi:
errors.append(" missing required field: 'model_relative_path' or 'models'")
for e in errors:
print(f"[ERROR] {e}", file=sys.stderr)
sys.exit(1)
if has_single and has_multi:
errors.append(" ambiguous: both 'model_relative_path' and 'models' are present, use one")
models = []
if has_single:
rel = config["model_relative_path"]
if not isinstance(rel, str) or not rel.strip():
errors.append(" [model_relative_path] must be a non-empty string")
elif not rel.endswith((".hbm", ".bin")):
errors.append(f" [model_relative_path] unsupported extension: '{rel}' (expected .hbm or .bin)")
else:
models = [{"name": Path(rel).name, "path": rel}]
if has_multi:
if not isinstance(config["models"], list) or len(config["models"]) == 0:
errors.append(" [models] must be a non-empty list")
else:
for i, m in enumerate(config["models"]):
prefix = f" [models[{i}]]"
if not isinstance(m, dict):
errors.append(f"{prefix} each entry must be an object")
continue
if "path" not in m:
errors.append(f"{prefix} missing required field 'path'")
elif not isinstance(m["path"], str) or not m["path"].strip():
errors.append(f"{prefix} 'path' must be a non-empty string")
elif not m["path"].endswith((".hbm", ".bin")):
errors.append(f"{prefix} unsupported extension: '{m['path']}' (expected .hbm or .bin)")
else:
models.append({"name": m.get("name", Path(m["path"]).name), "path": m["path"]})
if errors:
for e in errors:
print(f"[ERROR] {e}", file=sys.stderr)
sys.exit(1)
# --- 文件存在性检查 ---
missing = []
for m in models:
full = input_dir / m["path"]
if not full.exists():
missing.append(f" model file not found: {full}")
if missing:
for e in missing:
print(f"[ERROR] {e}", file=sys.stderr)
sys.exit(1)
print(f"[OK] {len(models)} model(s) validated, frame_count={config.get('frame_count', 200)}\n")
return models
def main():
parser = argparse.ArgumentParser(description="BPU model perf benchmark")
parser.add_argument("--input", required=True, help="Input JSON file")
parser.add_argument("--output", required=True, help="Output JSON file")
args = parser.parse_args()
if not Path(args.input).exists():
print(f"[ERROR] input file not found: {args.input}", file=sys.stderr)
sys.exit(1)
with open(args.input) as f:
config = json.load(f)
input_dir = Path(args.input).parent
frame_count = config.get("frame_count", 200)
models = validate_config(config, input_dir)
if not models:
print("[ERROR] no models specified in input JSON", file=sys.stderr)
sys.exit(1)
output_results = []
for model in models:
rel_path = model["path"]
model_path = str(input_dir / rel_path)
model_name = model.get("name", Path(rel_path).name)
print(f"\n[Benchmarking] {model_name} ({model_path})")
perf_results = []
for t in THREAD_COUNTS:
print(f" thread_num={t} ...", end=" ", flush=True)
p = run_perf(model_path, t, frame_count)
perf_results.append(p)
if p["fps"] is not None:
print(f"FPS={p['fps']:.2f} avg_latency={p['avg_latency_ms']:.3f}ms")
else:
print("FAILED (check raw_output in result JSON)")
output_results.append({
"model_name": model_name,
"model_path": model_path,
"perf_results": perf_results,
})
print("\n" + "=" * 80)
print_table(output_results)
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "w") as f:
json.dump(output_results, f, indent=2)
print(f"\nResults saved to: {args.output}")
if __name__ == "__main__":
main()