update perf docker
This commit is contained in:
parent
03e2d3520d
commit
a7bda96863
446
README.md
446
README.md
@ -1,29 +1,41 @@
|
||||
|
||||
- [Docker一键安装脚本](#docker一键安装脚本)
|
||||
- [Docker 一键安装脚本](#docker-一键安装脚本)
|
||||
- [支持平台](#支持平台)
|
||||
- [安装](#安装)
|
||||
- [测试](#测试)
|
||||
- [BPU 模型性能测试工具](#bpu-模型性能测试工具)
|
||||
- [工具说明](#工具说明)
|
||||
- [目录结构](#目录结构)
|
||||
- [RDK S100 系列使用方法(Nash-e / Nash-m)](#rdk-s100-系列使用方法nash-e--nash-m)
|
||||
- [RDK X5 系列使用方法(Bayes-e)](#rdk-x5-系列使用方法bayes-e)
|
||||
- [交互式进入容器](#交互式进入容器)
|
||||
- [task.json 协议说明](#taskjson-协议说明)
|
||||
- [输出结果说明](#输出结果说明)
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Docker一键安装脚本
|
||||
## Docker 一键安装脚本
|
||||
|
||||
### 支持平台
|
||||
1. RDK X5 / RDK X5 Module (RDK OS版本 >= 3.3.3, MiniBoot版本 >= Sep-03-2025)
|
||||
2. RDK S100 / RDK S100P (RDK OS版本 >= 4.0.4-Beta, Miniboot版本 >= 4.0.4-20251015222314)
|
||||
|
||||
| 硬件平台 | BPU 架构 | RDK OS 版本要求 | MiniBoot 版本要求 |
|
||||
|----------|----------|-----------------|-------------------|
|
||||
| RDK S100 / RDK S100P | Nash-e / Nash-m | >= 4.0.4-Beta | >= 4.0.4-20251015222314 |
|
||||
| RDK X5 / RDK X5 Module | Bayes-e | >= 3.3.3 | >= Sep-03-2025 |
|
||||
|
||||
|
||||
### 安装
|
||||
|
||||
脚本路径`RDK_Docker_Tools/install_docker.sh`, 该脚本会在RDK板端一键安装docker软件, 配置好网络相关配置.
|
||||
脚本路径 `RDK_Docker_Tools/install_docker/install_docker.sh`,该脚本会在 RDK 板端一键安装 Docker 软件,配置好网络相关配置。
|
||||
|
||||
1. 安装到使用过程无需重启.
|
||||
2. 安装过程中eth0网络配置不会做做任何修改, ssh不会挂掉.
|
||||
3. 请使用root用户安装.
|
||||
4. 请确保板卡能正常访问互联网, `/`目录有2GB可用空间.
|
||||
1. 安装到使用过程无需重启。
|
||||
2. 安装过程中 eth0 网络配置不会做任何修改,ssh 不会挂掉。
|
||||
3. 请使用 root 用户安装。
|
||||
4. 请确保板卡能正常访问互联网,`/` 目录有 2GB 可用空间。
|
||||
|
||||
```bash
|
||||
bash install_docker.sh
|
||||
bash install_docker/install_docker.sh
|
||||
```
|
||||
|
||||
预期输出
|
||||
@ -42,131 +54,23 @@ bash install_docker.sh
|
||||
[INFO] 注意: 本脚本不会修改 eth0 的任何配置
|
||||
[INFO] 检查 Docker 是否已安装...
|
||||
[INFO] 检查依赖命令...
|
||||
[WARN] 以下命令缺失: curl, 尝试安装...
|
||||
Reading package lists... Done
|
||||
Building dependency tree... Done
|
||||
Reading state information... Done
|
||||
The following additional packages will be installed:
|
||||
libcurl4 libcurl4-openssl-dev
|
||||
Suggested packages:
|
||||
libcurl4-doc libidn11-dev librtmp-dev libssh2-1-dev
|
||||
The following NEW packages will be installed:
|
||||
curl
|
||||
The following packages will be upgraded:
|
||||
libcurl4 libcurl4-openssl-dev
|
||||
2 upgraded, 1 newly installed, 0 to remove and 414 not upgraded.
|
||||
Need to get 190 kB/867 kB of archives.
|
||||
After this operation, 439 kB of additional disk space will be used.
|
||||
Get:1 http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports jammy-updates/main arm64 curl arm64 7.81.0-1ubuntu1.23 [190 kB]
|
||||
Fetched 190 kB in 0s (475 kB/s)
|
||||
perl: warning: Setting locale failed.
|
||||
perl: warning: Please check that your locale settings:
|
||||
LANGUAGE = (unset),
|
||||
LC_ALL = (unset),
|
||||
LANG = "en_US.UTF-8"
|
||||
are supported and installed on your system.
|
||||
perl: warning: Falling back to the standard locale ("C").
|
||||
locale: Cannot set LC_CTYPE to default locale: No such file or directory
|
||||
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
|
||||
locale: Cannot set LC_ALL to default locale: No such file or directory
|
||||
(Reading database ... 219577 files and directories currently installed.)
|
||||
Preparing to unpack .../libcurl4-openssl-dev_7.81.0-1ubuntu1.23_arm64.deb ...
|
||||
Unpacking libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
|
||||
Preparing to unpack .../libcurl4_7.81.0-1ubuntu1.23_arm64.deb ...
|
||||
Unpacking libcurl4:arm64 (7.81.0-1ubuntu1.23) over (7.81.0-1ubuntu1.21) ...
|
||||
Selecting previously unselected package curl.
|
||||
Preparing to unpack .../curl_7.81.0-1ubuntu1.23_arm64.deb ...
|
||||
Unpacking curl (7.81.0-1ubuntu1.23) ...
|
||||
Setting up libcurl4:arm64 (7.81.0-1ubuntu1.23) ...
|
||||
Setting up curl (7.81.0-1ubuntu1.23) ...
|
||||
Setting up libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.23) ...
|
||||
Processing triggers for man-db (2.10.2-1) ...
|
||||
Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
|
||||
[OK] 依赖命令检查完毕 ✓
|
||||
[INFO] 检查网络连通性...
|
||||
[OK] 可访问 Ubuntu ports 源 (ports.ubuntu.com) ✓
|
||||
[INFO] 更新 apt 软件包缓存...
|
||||
[OK] apt 缓存更新完毕 ✓
|
||||
[INFO] 安装必要依赖包...
|
||||
[INFO] 安装缺失依赖: apt-transport-https
|
||||
...
|
||||
[OK] 依赖包就绪 ✓
|
||||
[INFO] 开始安装 Docker (docker.io)...
|
||||
...
|
||||
Adding group `docker' (GID 135) ...
|
||||
Done.
|
||||
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /lib/systemd/system/docker.service.
|
||||
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket.
|
||||
Job for docker.service failed because the control process exited with error code.
|
||||
See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
|
||||
invoke-rc.d: initscript docker, action "start" failed.
|
||||
● docker.service - Docker Application Container Engine
|
||||
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
|
||||
Active: activating (auto-restart) (Result: exit-code) since Wed 2026-03-18 11:12:24 CST; 7ms ago
|
||||
TriggeredBy: ● docker.socket
|
||||
Docs: https://docs.docker.com
|
||||
Process: 231291 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
|
||||
Main PID: 231291 (code=exited, status=1/FAILURE)
|
||||
Setting up dnsmasq-base (2.90-0ubuntu0.22.04.1) ...
|
||||
Setting up docker-buildx (0.21.3-0ubuntu1~22.04.1) ...
|
||||
Setting up ubuntu-fan (0.12.16) ...
|
||||
Created symlink /etc/systemd/system/multi-user.target.wants/ubuntu-fan.service → /lib/systemd/system/ubuntu-fan.service.
|
||||
Processing triggers for man-db (2.10.2-1) ...
|
||||
Processing triggers for dbus (1.12.20-2ubuntu4.1) ...
|
||||
Processing triggers for libc-bin (2.35-0ubuntu3.11) ...
|
||||
[OK] Docker 安装完毕 ✓
|
||||
[INFO] 检查 iptables 后端兼容性...
|
||||
update-alternatives: using /usr/sbin/iptables-legacy to provide /usr/sbin/iptables (iptables) in manual mode
|
||||
update-alternatives: using /usr/sbin/ip6tables-legacy to provide /usr/sbin/ip6tables (ip6tables) in manual mode
|
||||
[OK] 已切换到 iptables-legacy 后端 ✓
|
||||
[INFO] 配置 Docker 镜像加速源...
|
||||
[WARN] 内核缺少 iptable_raw 模块, 将启用 allow-direct-routing 绕过 raw 表限制
|
||||
[OK] 镜像加速配置完毕 ✓
|
||||
[INFO] 启动 Docker 服务...
|
||||
[OK] Docker 服务已通过 systemd 启动 ✓
|
||||
[INFO] 检查 Docker Hub 及镜像源 DNS 解析...
|
||||
[OK] DNS hosts 修复完毕 ✓
|
||||
|
||||
[INFO] 验证 Docker 安装...
|
||||
[OK] Docker 版本: Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1
|
||||
[OK] dockerd 进程运行中 ✓
|
||||
[OK] Docker socket 存在: /var/run/docker.sock ✓
|
||||
[OK] docker info 执行成功 ✓
|
||||
|
||||
[INFO] 验证核心命令可用性...
|
||||
[OK] docker pull --help ✓
|
||||
[OK] docker push --help ✓
|
||||
[OK] docker commit --help ✓
|
||||
[OK] docker export --help ✓
|
||||
[OK] docker load --help ✓
|
||||
[OK] docker images ✓
|
||||
[OK] docker ps ✓
|
||||
|
||||
...
|
||||
============================================================
|
||||
[OK] Docker 安装完成!
|
||||
============================================================
|
||||
|
||||
常用命令速查:
|
||||
docker pull <镜像> # 拉取镜像
|
||||
docker push <镜像> # 推送镜像
|
||||
docker commit <容器> <镜像> # 提交容器为镜像
|
||||
docker export <容器> -o x.tar # 导出容器
|
||||
docker load -i <文件.tar> # 加载镜像
|
||||
docker images # 查看本地镜像
|
||||
docker ps -a # 查看所有容器
|
||||
|
||||
注意事项:
|
||||
- eth0 网络配置未做任何修改
|
||||
- 本次安装无需重启系统
|
||||
- 镜像加速配置: /etc/docker/daemon.json
|
||||
- Docker 日志: journalctl -u docker 或 /var/log/dockerd.log
|
||||
```
|
||||
|
||||
|
||||
### 测试
|
||||
|
||||
```bash
|
||||
bash test_docker.sh
|
||||
bash install_docker/test_docker.sh
|
||||
```
|
||||
|
||||
预期输出
|
||||
@ -182,99 +86,263 @@ bash test_docker.sh
|
||||
------------------------------------------------------------
|
||||
[INFO] [1/5] 测试 docker pull...
|
||||
------------------------------------------------------------
|
||||
[INFO] 尝试拉取: hello-world
|
||||
Using default tag: latest
|
||||
latest: Pulling from library/hello-world
|
||||
198f93fd5094: Pull complete
|
||||
Digest: sha256:85404b3c53951c3ff5d40de0972b1bb21fafa2e8daa235355baf44f33db9dbdd
|
||||
Status: Downloaded newer image for hello-world:latest
|
||||
docker.io/library/hello-world:latest
|
||||
[OK] docker pull 成功 ✓
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] [2/5] 测试 docker run...
|
||||
------------------------------------------------------------
|
||||
|
||||
Hello from Docker!
|
||||
This message shows that your installation appears to be working correctly.
|
||||
|
||||
To generate this message, Docker took the following steps:
|
||||
1. The Docker client contacted the Docker daemon.
|
||||
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
|
||||
(arm64v8)
|
||||
3. The Docker daemon created a new container from that image which runs the
|
||||
executable that produces the output you are currently reading.
|
||||
4. The Docker daemon streamed that output to the Docker client, which sent it
|
||||
to your terminal.
|
||||
|
||||
To try something more ambitious, you can run an Ubuntu container with:
|
||||
$ docker run -it ubuntu bash
|
||||
|
||||
Share images, automate workflows, and more with a free Docker ID:
|
||||
https://hub.docker.com/
|
||||
|
||||
For more examples and ideas, visit:
|
||||
https://docs.docker.com/get-started/
|
||||
|
||||
[OK] docker run 成功 ✓
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] [3/5] 测试 docker commit...
|
||||
------------------------------------------------------------
|
||||
sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
|
||||
[OK] docker commit 成功 -> hello-world-committed:test ✓
|
||||
[OK] commit 镜像验证通过 ✓
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] [4/5] 测试 docker export...
|
||||
------------------------------------------------------------
|
||||
[OK] docker export 成功 -> /tmp/docker_test_232981/hello-world-export.tar (16K) ✓
|
||||
[OK] docker export 成功 ✓
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] [5/5] 测试 docker save & load...
|
||||
------------------------------------------------------------
|
||||
[OK] docker save 成功 -> /tmp/docker_test_232981/hello-world-load.tar (20K) ✓
|
||||
Untagged: hello-world-committed:test
|
||||
Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
|
||||
[INFO] 已删除本地镜像 hello-world-committed:test, 准备 load...
|
||||
Loaded image: hello-world-committed:test
|
||||
[OK] docker save 成功 ✓
|
||||
[OK] docker load 成功, 镜像已恢复 ✓
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] [附] docker push 说明
|
||||
------------------------------------------------------------
|
||||
[WARN] docker push 需要登录镜像仓库, 本脚本不自动执行
|
||||
推送到 Docker Hub:
|
||||
docker login
|
||||
docker tag hello-world-committed:test <你的用户名>/hello-world:test
|
||||
docker push <你的用户名>/hello-world:test
|
||||
|
||||
推送到私有仓库:
|
||||
docker tag hello-world-committed:test <仓库地址>/hello-world:test
|
||||
docker push <仓库地址>/hello-world:test
|
||||
|
||||
------------------------------------------------------------
|
||||
[INFO] 清理测试资源...
|
||||
------------------------------------------------------------
|
||||
docker_test_232981
|
||||
[OK] 容器已清理 ✓
|
||||
Untagged: hello-world-committed:test
|
||||
Deleted: sha256:c2c4ee901ecc7b04b29dadeebc5547450e00ec105d63b7109a462e60628fb866
|
||||
[OK] commit 镜像已清理 ✓
|
||||
[OK] 临时文件已清理 ✓
|
||||
|
||||
============================================================
|
||||
[OK] 全部测试通过!
|
||||
============================================================
|
||||
|
||||
测试结果汇总:
|
||||
✓ docker pull - 镜像拉取
|
||||
✓ docker run - 容器运行
|
||||
✓ docker commit - 容器提交为镜像
|
||||
✓ docker export - 容器导出为 tar
|
||||
✓ docker save - 镜像保存为 tar
|
||||
✓ docker load - 从 tar 加载镜像
|
||||
- docker push - 需手动配置仓库后执行
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
## BPU 模型性能测试工具
|
||||
|
||||
### 工具说明
|
||||
|
||||
基于 `hrt_model_exec perf` 命令,对指定的 `.hbm` 或 `.bin` 模型文件,分别在线程数 1、2、3、4 下进行性能测试,输出每个线程数对应的平均延迟和 FPS,并将结果保存为 JSON 文件。
|
||||
|
||||
| 硬件平台 | BPU 架构 | 镜像名称 |
|
||||
|----------|----------|----------|
|
||||
| RDK S100 / RDK S100P | Nash-e / Nash-m | `hrt_perf_nashem:v3.7.3` |
|
||||
| RDK X5 / RDK X5 Module | Bayes-e | `hrt_perf_bayese:v1.24.5` |
|
||||
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
bpu_model_perf_images/
|
||||
├── Dockerfile.nashem_perf # RDK S100 (Nash-e/Nash-m) Dockerfile
|
||||
├── Dockerfile.bayese_perf # RDK X5 (Bayes-e) Dockerfile
|
||||
├── docker_build_nashem.sh # RDK S100 镜像构建脚本
|
||||
├── docker_build_bayese.sh # RDK X5 镜像构建脚本
|
||||
├── docker_run_nashem_perf.sh # RDK S100 自动化 perf 运行脚本
|
||||
├── docker_run_bayese_perf.sh # RDK X5 自动化 perf 运行脚本
|
||||
├── docker_test_nashem.sh # RDK S100 交互式进入容器脚本
|
||||
├── docker_test_bayese.sh # RDK X5 交互式进入容器脚本
|
||||
├── workspace/
|
||||
│ ├── perf.py # 性能测试主脚本(两平台共用)
|
||||
│ └── entrypoint.sh # 容器入口脚本(两平台共用)
|
||||
├── example_fs/ # RDK S100 示例输入输出
|
||||
│ ├── input/
|
||||
│ │ ├── task.json # 输入协议文件
|
||||
│ │ └── *.hbm # 模型文件(放在此处)
|
||||
│ └── output/
|
||||
│ └── result.json # 运行后自动生成
|
||||
└── example_fs_bayese/ # RDK X5 示例输入输出
|
||||
├── input/
|
||||
│ ├── task.json # 输入协议文件
|
||||
│ └── *.hbm / *.bin # 模型文件(放在此处)
|
||||
└── output/
|
||||
└── result.json # 运行后自动生成
|
||||
```
|
||||
|
||||
|
||||
### RDK S100 系列使用方法(Nash-e / Nash-m)
|
||||
|
||||
**第一步:构建镜像**(只需构建一次,之后可直接交付镜像)
|
||||
|
||||
```bash
|
||||
cd bpu_model_perf_images
|
||||
bash docker_build_nashem.sh
|
||||
```
|
||||
|
||||
构建脚本会自动从板子上找到 `hrt_model_exec`(`which hrt_model_exec`)并打包进镜像,无需手动操作。
|
||||
|
||||
**第二步:准备输入**
|
||||
|
||||
将模型文件(`.hbm`)放入 `example_fs/input/`,并编辑 `example_fs/input/task.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"model_relative_path": "your_model.hbm",
|
||||
"frame_count": 200
|
||||
}
|
||||
```
|
||||
|
||||
**第三步:运行**
|
||||
|
||||
```bash
|
||||
bash docker_run_nashem_perf.sh
|
||||
```
|
||||
|
||||
运行脚本将 `example_fs/input` 挂载为容器内 `/workspace/input`,`example_fs/output` 挂载为 `/workspace/output`。
|
||||
|
||||
**预期输出**
|
||||
|
||||
```text
|
||||
============================================================
|
||||
task.json content:
|
||||
{
|
||||
"model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
|
||||
"frame_count": 200
|
||||
}
|
||||
============================================================
|
||||
[OK] 1 model(s) validated, frame_count=200
|
||||
|
||||
[Benchmarking] yolo11n_detect_nashe_640x640_nv12.hbm (/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm)
|
||||
thread_num=1 ... FPS=625.20 avg_latency=1.538ms
|
||||
thread_num=2 ... FPS=1133.43 avg_latency=1.703ms
|
||||
thread_num=3 ... FPS=1171.23 avg_latency=2.482ms
|
||||
thread_num=4 ... FPS=1167.98 avg_latency=3.322ms
|
||||
|
||||
================================================================================
|
||||
+---------------------------------------+-----------+------------------+------------+
|
||||
| Model | Threads | Avg Latency(ms) | FPS |
|
||||
+---------------------------------------+-----------+------------------+------------+
|
||||
| yolo11n_detect_nashe_640x640_nv12.hbm | 1 | 1.538 | 625.20 |
|
||||
| | 2 | 1.703 | 1133.43 |
|
||||
| | 3 | 2.482 | 1171.23 |
|
||||
| | 4 | 3.322 | 1167.98 |
|
||||
+---------------------------------------+-----------+------------------+------------+
|
||||
|
||||
Results saved to: /workspace/output/result.json
|
||||
```
|
||||
|
||||
|
||||
### RDK X5 系列使用方法(Bayes-e)
|
||||
|
||||
**第一步:构建镜像**
|
||||
|
||||
```bash
|
||||
cd bpu_model_perf_images
|
||||
bash docker_build_bayese.sh
|
||||
```
|
||||
|
||||
**第二步:准备输入**
|
||||
|
||||
将模型文件(`.hbm` 或 `.bin`)放入 `example_fs_bayese/input/`,并编辑 `example_fs_bayese/input/task.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"model_relative_path": "your_model.hbm",
|
||||
"frame_count": 200
|
||||
}
|
||||
```
|
||||
|
||||
**第三步:运行**
|
||||
|
||||
```bash
|
||||
bash docker_run_bayese_perf.sh
|
||||
```
|
||||
|
||||
|
||||
### 交互式进入容器
|
||||
|
||||
镜像内置了 `hrt_model_exec` 等 BPU 工具,可以交互式进入容器手动调试模型。
|
||||
|
||||
**RDK S100(Nash-e / Nash-m)**
|
||||
|
||||
```bash
|
||||
cd bpu_model_perf_images
|
||||
bash docker_test_nashem.sh
|
||||
```
|
||||
|
||||
**RDK X5(Bayes-e)**
|
||||
|
||||
```bash
|
||||
cd bpu_model_perf_images
|
||||
bash docker_test_bayese.sh
|
||||
```
|
||||
|
||||
进入容器后可直接使用 `hrt_model_exec`,例如:
|
||||
|
||||
```bash
|
||||
# 查看帮助
|
||||
hrt_model_exec --help
|
||||
|
||||
# 手动跑 perf
|
||||
hrt_model_exec perf --model_file /workspace/input/xxx.hbm --thread_num 2 --frame_count 200
|
||||
|
||||
# 手动跑推理
|
||||
hrt_model_exec infer --model_file /workspace/input/xxx.hbm --input_file input.jpg
|
||||
```
|
||||
|
||||
> 容器以 `--privileged` 模式启动,宿主机的 `/usr/hobot`、`/opt/hobot` 等目录均已挂载,库文件与板子上完全一致。退出容器后自动删除(`--rm`)。
|
||||
|
||||
|
||||
### task.json 协议说明
|
||||
|
||||
支持两种格式:
|
||||
|
||||
**单模型**
|
||||
|
||||
| 字段 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `model_relative_path` | string | 是 | 模型文件名,相对于 input 目录,后缀必须为 `.hbm` 或 `.bin` |
|
||||
| `frame_count` | int | 否 | 测试帧数,默认 200 |
|
||||
|
||||
```json
|
||||
{
|
||||
"model_relative_path": "model.hbm",
|
||||
"frame_count": 200
|
||||
}
|
||||
```
|
||||
|
||||
**多模型**
|
||||
|
||||
| 字段 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| `models` | list | 是 | 模型列表 |
|
||||
| `models[].path` | string | 是 | 模型文件名,相对于 input 目录 |
|
||||
| `models[].name` | string | 否 | 显示名称,默认取文件名 |
|
||||
| `frame_count` | int | 否 | 测试帧数,默认 200 |
|
||||
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{"name": "model_a", "path": "model_a.hbm"},
|
||||
{"name": "model_b", "path": "model_b.bin"}
|
||||
],
|
||||
"frame_count": 200
|
||||
}
|
||||
```
|
||||
|
||||
> 注意:`model_relative_path` 和 `models` 不能同时出现。
|
||||
|
||||
|
||||
### 输出结果说明
|
||||
|
||||
结果保存在 `output/result.json`,结构如下:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"model_name": "model.hbm",
|
||||
"model_path": "/workspace/input/model.hbm",
|
||||
"perf_results": [
|
||||
{
|
||||
"thread_num": 1,
|
||||
"frame_count": 200,
|
||||
"run_time_ms": 320.18,
|
||||
"total_latency_ms": 307.53,
|
||||
"avg_latency_ms": 1.538,
|
||||
"fps": 625.197,
|
||||
"raw_output": "...",
|
||||
"returncode": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
65
bpu_model_perf_images/Dockerfile.bayese_perf
Normal file
65
bpu_model_perf_images/Dockerfile.bayese_perf
Normal file
@ -0,0 +1,65 @@
|
||||
FROM ubuntu:22.04
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
# ============================================================
|
||||
# 基础工具
|
||||
# SSH Server 安装和配置
|
||||
# ============================================================
|
||||
RUN apt-get update && apt-get install -y \
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
git \
|
||||
wget \
|
||||
curl \
|
||||
vim \
|
||||
sshpass \
|
||||
i2c-tools \
|
||||
software-properties-common \
|
||||
gnupg2 \
|
||||
lsb-release \
|
||||
openssh-server \
|
||||
locales \
|
||||
&& mkdir -p /var/run/sshd \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# 配置中文 locale(防止终端中文乱码)
|
||||
RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
|
||||
ENV LANG=zh_CN.UTF-8
|
||||
ENV LC_ALL=zh_CN.UTF-8
|
||||
|
||||
# SSH 安全配置
|
||||
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
|
||||
&& echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
|
||||
&& echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
|
||||
&& echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
|
||||
&& echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
|
||||
&& echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
|
||||
|
||||
# 生成 SSH Host Key(容器首次启动时会自动生成,这里预生成加速启动)
|
||||
RUN ssh-keygen -A
|
||||
|
||||
# 设置 root 密码
|
||||
RUN echo 'root:root' | chpasswd
|
||||
|
||||
# ============================================================
|
||||
# BPU Model Perf 工具
|
||||
# ============================================================
|
||||
|
||||
# 拷贝 hrt_model_exec(从宿主机板子上的路径)
|
||||
COPY hrt_model_exec /usr/local/bin/hrt_model_exec
|
||||
RUN chmod +x /usr/local/bin/hrt_model_exec
|
||||
|
||||
# 拷贝 perf 脚本和 entrypoint
|
||||
COPY workspace/perf.py /workspace/perf/perf.py
|
||||
COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
|
||||
RUN chmod +x /workspace/perf/entrypoint.sh
|
||||
|
||||
# 工作目录和挂载点
|
||||
RUN mkdir -p /workspace/input /workspace/output
|
||||
|
||||
WORKDIR /workspace/perf
|
||||
|
||||
ENTRYPOINT ["/workspace/perf/entrypoint.sh"]
|
||||
66
bpu_model_perf_images/Dockerfile.nashem_perf
Normal file
66
bpu_model_perf_images/Dockerfile.nashem_perf
Normal file
@ -0,0 +1,66 @@
|
||||
FROM ubuntu:22.04
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
# ============================================================
|
||||
# 基础工具
|
||||
# SSH Server 安装和配置
|
||||
# ============================================================
|
||||
RUN apt-get update && apt-get install -y \
|
||||
python3 \
|
||||
python3-pip \
|
||||
python3-venv \
|
||||
git \
|
||||
wget \
|
||||
curl \
|
||||
vim \
|
||||
sshpass \
|
||||
i2c-tools \
|
||||
software-properties-common \
|
||||
gnupg2 \
|
||||
lsb-release \
|
||||
openssh-server \
|
||||
locales \
|
||||
&& mkdir -p /var/run/sshd \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# 配置中文 locale(防止终端中文乱码)
|
||||
RUN locale-gen zh_CN.UTF-8 && update-locale LANG=zh_CN.UTF-8
|
||||
ENV LANG=zh_CN.UTF-8
|
||||
ENV LC_ALL=zh_CN.UTF-8
|
||||
|
||||
# SSH 安全配置
|
||||
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config \
|
||||
&& echo "MaxAuthTries 3" >> /etc/ssh/sshd_config \
|
||||
&& echo "MaxStartups 3:50:10" >> /etc/ssh/sshd_config \
|
||||
&& echo "LoginGraceTime 30" >> /etc/ssh/sshd_config \
|
||||
&& echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config \
|
||||
&& echo "ClientAliveCountMax 3" >> /etc/ssh/sshd_config
|
||||
|
||||
# 生成 SSH Host Key(容器首次启动时会自动生成,这里预生成加速启动)
|
||||
RUN ssh-keygen -A
|
||||
|
||||
# 设置 root 密码
|
||||
RUN echo 'root:root' | chpasswd
|
||||
|
||||
# ============================================================
|
||||
# BPU Model Perf 工具
|
||||
# ============================================================
|
||||
|
||||
# 拷贝 hrt_model_exec(从宿主机板子上的路径)
|
||||
COPY hrt_model_exec /usr/local/bin/hrt_model_exec
|
||||
RUN chmod +x /usr/local/bin/hrt_model_exec
|
||||
|
||||
# 拷贝 perf 脚本和 entrypoint
|
||||
COPY workspace/perf.py /workspace/perf/perf.py
|
||||
COPY workspace/entrypoint.sh /workspace/perf/entrypoint.sh
|
||||
RUN chmod +x /workspace/perf/entrypoint.sh
|
||||
|
||||
# 工作目录和挂载点
|
||||
RUN mkdir -p /workspace/input /workspace/output
|
||||
|
||||
WORKDIR /workspace/perf
|
||||
|
||||
ENTRYPOINT ["/workspace/perf/entrypoint.sh"]
|
||||
|
||||
14
bpu_model_perf_images/docker_build_bayese.sh
Normal file
14
bpu_model_perf_images/docker_build_bayese.sh
Normal file
@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
|
||||
# Copy hrt_model_exec into build context
|
||||
HRT_BIN="$(which hrt_model_exec)"
|
||||
echo "Copying hrt_model_exec from $HRT_BIN"
|
||||
cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
|
||||
|
||||
docker build -f Dockerfile.bayese_perf -t hrt_perf_bayese:v1.24.5 "$SCRIPT_DIR"
|
||||
|
||||
# Clean up copied binary
|
||||
rm -f "$SCRIPT_DIR/hrt_model_exec"
|
||||
14
bpu_model_perf_images/docker_build_nashem.sh
Normal file
14
bpu_model_perf_images/docker_build_nashem.sh
Normal file
@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
|
||||
# Copy hrt_model_exec into build context
|
||||
HRT_BIN="$(which hrt_model_exec)"
|
||||
echo "Copying hrt_model_exec from $HRT_BIN"
|
||||
cp "$HRT_BIN" "$SCRIPT_DIR/hrt_model_exec"
|
||||
|
||||
docker build -f Dockerfile.nashem_perf -t hrt_perf_nashem:v3.7.3 "$SCRIPT_DIR"
|
||||
|
||||
# Clean up copied binary
|
||||
rm -f "$SCRIPT_DIR/hrt_model_exec"
|
||||
16
bpu_model_perf_images/docker_run_bayese_perf.sh
Normal file
16
bpu_model_perf_images/docker_run_bayese_perf.sh
Normal file
@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
# Run the bayese perf container
|
||||
# Input: /workspace/input/task.json (mounted from host)
|
||||
# Output: /workspace/output/result.json (mounted from host)
|
||||
|
||||
docker run --rm \
|
||||
--name hrt-perf-bayese \
|
||||
--privileged \
|
||||
-v /opt/hobot:/opt/hobot:ro \
|
||||
-v /usr/hobot:/usr/hobot:ro \
|
||||
-v /opt/tros:/opt/tros:ro \
|
||||
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
|
||||
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
|
||||
-v $(pwd)/example_fs/input:/workspace/input:ro \
|
||||
-v $(pwd)/example_fs/output:/workspace/output \
|
||||
hrt_perf_bayese:v1.24.5
|
||||
16
bpu_model_perf_images/docker_run_nashem_perf.sh
Normal file
16
bpu_model_perf_images/docker_run_nashem_perf.sh
Normal file
@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
# Run the nashem perf container
|
||||
# Input: /workspace/input/task.json (mounted from host)
|
||||
# Output: /workspace/output/result.json (mounted from host)
|
||||
|
||||
docker run --rm \
|
||||
--name hrt-perf-nashem \
|
||||
--privileged \
|
||||
-v /opt/hobot:/opt/hobot:ro \
|
||||
-v /usr/hobot:/usr/hobot:ro \
|
||||
-v /opt/tros:/opt/tros:ro \
|
||||
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
|
||||
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
|
||||
-v $(pwd)/example_fs/input:/workspace/input:ro \
|
||||
-v $(pwd)/example_fs/output:/workspace/output \
|
||||
hrt_perf_nashem:v3.7.3
|
||||
15
bpu_model_perf_images/docker_test_bayese.sh
Normal file
15
bpu_model_perf_images/docker_test_bayese.sh
Normal file
@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
# 交互式启动 BaYeSe perf 容器,entrypoint 为 /bin/bash
|
||||
# 用于在容器内交互式使用 BPU(hrt_model_exec 等工具)
|
||||
|
||||
docker run -it --rm \
|
||||
--name hrt-perf-bayese-interactive \
|
||||
--privileged \
|
||||
-v /opt/hobot:/opt/hobot:ro \
|
||||
-v /usr/hobot:/usr/hobot:ro \
|
||||
-v /opt/tros:/opt/tros:ro \
|
||||
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
|
||||
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
|
||||
-e ROS_DOMAIN_ID=0 \
|
||||
--entrypoint /bin/bash \
|
||||
hrt_perf_bayese:v1.24.5
|
||||
15
bpu_model_perf_images/docker_test_nashem.sh
Normal file
15
bpu_model_perf_images/docker_test_nashem.sh
Normal file
@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
# 交互式启动 NashEM perf 容器,entrypoint 为 /bin/bash
|
||||
# 用于在容器内交互式使用 BPU(hrt_model_exec 等工具)
|
||||
|
||||
docker run -it --rm \
|
||||
--name hrt-perf-nashem-interactive \
|
||||
--privileged \
|
||||
-v /opt/hobot:/opt/hobot:ro \
|
||||
-v /usr/hobot:/usr/hobot:ro \
|
||||
-v /opt/tros:/opt/tros:ro \
|
||||
-v /lib/aarch64-linux-gnu:/host_lib/aarch64-linux-gnu:ro \
|
||||
-e LD_LIBRARY_PATH="/usr/hobot/lib:/host_lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu" \
|
||||
-e ROS_DOMAIN_ID=0 \
|
||||
--entrypoint /bin/bash \
|
||||
hrt_perf_nashem:v3.7.3
|
||||
4
bpu_model_perf_images/example_fs/input/task.json
Normal file
4
bpu_model_perf_images/example_fs/input/task.json
Normal file
@ -0,0 +1,4 @@
|
||||
{
|
||||
"model_relative_path": "yolo11n_detect_nashe_640x640_nv12.hbm",
|
||||
"frame_count": 20
|
||||
}
|
||||
Binary file not shown.
48
bpu_model_perf_images/example_fs/output/result.json
Normal file
48
bpu_model_perf_images/example_fs/output/result.json
Normal file
@ -0,0 +1,48 @@
|
||||
[
|
||||
{
|
||||
"model_name": "yolo11n_detect_nashe_640x640_nv12.hbm",
|
||||
"model_path": "/workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm",
|
||||
"perf_results": [
|
||||
{
|
||||
"thread_num": 1,
|
||||
"frame_count": 20,
|
||||
"run_time_ms": 32.467,
|
||||
"total_latency_ms": 30.841,
|
||||
"avg_latency_ms": 1.542,
|
||||
"fps": 620.79,
|
||||
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 1 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473143438048][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 352.489ms.\n\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][9][03-18][09:19:21:855][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 1\n Frame count is: 20\n Program run time: 32.467 ms\nPerf result:\n Frame totally latency is: 30.841 ms\n Average latency is: 1.542 ms\n Frame rate is: 620.790 FPS\n",
|
||||
"returncode": 0
|
||||
},
|
||||
{
|
||||
"thread_num": 2,
|
||||
"frame_count": 20,
|
||||
"run_time_ms": 19.705,
|
||||
"total_latency_ms": 36.596,
|
||||
"avg_latency_ms": 1.837,
|
||||
"fps": 1016.369,
|
||||
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 2 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472894728928][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 356.068ms.\n\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][53][03-18][09:19:23:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 2\n Frame count is: 20\n Program run time: 19.705 ms\nPerf result:\n Frame totally latency is: 36.596 ms\n Average latency is: 1.837 ms\n Frame rate is: 1016.369 FPS\n",
|
||||
"returncode": 0
|
||||
},
|
||||
{
|
||||
"thread_num": 3,
|
||||
"frame_count": 20,
|
||||
"run_time_ms": 18.084,
|
||||
"total_latency_ms": 49.101,
|
||||
"avg_latency_ms": 2.456,
|
||||
"fps": 1111.797,
|
||||
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 3 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281473576172256][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.722ms.\n\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][98][03-18][09:19:25:890][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 3\n Frame count is: 20\n Program run time: 18.084 ms\nPerf result:\n Frame totally latency is: 49.101 ms\n Average latency is: 2.456 ms\n Frame rate is: 1111.797 FPS\n",
|
||||
"returncode": 0
|
||||
},
|
||||
{
|
||||
"thread_num": 4,
|
||||
"frame_count": 20,
|
||||
"run_time_ms": 18.174,
|
||||
"total_latency_ms": 64.363,
|
||||
"avg_latency_ms": 3.217,
|
||||
"fps": 1092.826,
|
||||
"raw_output": "[UCP]: log level = 3\n[UCP]: UCP version = 3.7.3\n[VP]: log level = 3\n[DNN]: log level = 3\n[HPL]: log level = 3\n[UCPT]: log level = 6\nhrt_model_exec perf --model_file /workspace/input/yolo11n_detect_nashe_640x640_nv12.hbm --thread_num 4 --frame_count 20\n\n\u001b[1;33m [Warning]: These operators have range limitations on input data: \u001b[0m\n\u001b[1;33m [Acos, Acosh, Asin, Atanh, BevPoolV2, Div, Gather, GatherElements, GatherND, GridSample, ImageDecoder, IndexSelect, Log, Mod, Pow, Reciprocal, RoiAlign, ScatterElements, ScatterND, Slice, Sqrt, Tan, Tile, Topk, Upsample]. \u001b[0m\n\u001b[1;33m Please make sure that these operators are not in your model, when no input data is provided to the tool. \u001b[0m\n\u001b[1;33m [Suggestion]: Using --input_file command to specify perf input data, which can appoint valid input data. \u001b[0m\n\n[BPU][[BPU_MONITOR]][281472884636384][INFO]BPULib verison(2, 1, 2)[0d3f195]!\n[DNN] HBTL_EXT_DNN log level:6\n[DNN]: 3.7.3_(4.2.11 HBRT)\nLoad model to DDR cost 347.186ms.\n\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[0] stride is dynamic, but you did not specify the stride, set as (409600,640,1,1)\n\u001b[m\u001b[K\u001b[32m[I][144][03-18][09:19:27:914][tensor_util.cpp:321][hrt_model_exec][HRT_MODEL_EXEC] Input[1] stride is dynamic, but you did not specify the stride, set as (204800,640,2,1)\n\u001b[m\u001b[K\nRunning condition:\n Thread number is: 4\n Frame count is: 20\n Program run time: 18.174 ms\nPerf result:\n Frame totally latency is: 64.363 ms\n Average latency is: 3.217 ms\n Frame rate is: 1092.826 FPS\n",
|
||||
"returncode": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
4
bpu_model_perf_images/example_fs_bayese/input/task.json
Normal file
4
bpu_model_perf_images/example_fs_bayese/input/task.json
Normal file
@ -0,0 +1,4 @@
|
||||
{
|
||||
"model_relative_path": "your_bayese_model.hbm",
|
||||
"frame_count": 200
|
||||
}
|
||||
20
bpu_model_perf_images/workspace/entrypoint.sh
Normal file
20
bpu_model_perf_images/workspace/entrypoint.sh
Normal file
@ -0,0 +1,20 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
INPUT_DIR="/workspace/input"
|
||||
OUTPUT_DIR="/workspace/output"
|
||||
INPUT_JSON="${INPUT_DIR}/task.json"
|
||||
OUTPUT_JSON="${OUTPUT_DIR}/result.json"
|
||||
|
||||
# Allow overrides via env vars
|
||||
INPUT_JSON="${PERF_INPUT:-$INPUT_JSON}"
|
||||
OUTPUT_JSON="${PERF_OUTPUT:-$OUTPUT_JSON}"
|
||||
|
||||
if [ ! -f "$INPUT_JSON" ]; then
|
||||
echo "ERROR: input file not found: $INPUT_JSON"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
mkdir -p "$(dirname "$OUTPUT_JSON")"
|
||||
|
||||
exec python3 /workspace/perf/perf.py --input "$INPUT_JSON" --output "$OUTPUT_JSON"
|
||||
221
bpu_model_perf_images/workspace/perf.py
Normal file
221
bpu_model_perf_images/workspace/perf.py
Normal file
@ -0,0 +1,221 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
BPU Model Performance Benchmark Tool
|
||||
Runs hrt_model_exec perf for each model at thread counts 1,2,3,4
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
HRT_MODEL_EXEC = "hrt_model_exec"
|
||||
THREAD_COUNTS = [1, 2, 3, 4]
|
||||
|
||||
|
||||
def run_perf(model_path: str, thread_num: int, frame_count: int = 200) -> dict:
|
||||
"""Run hrt_model_exec perf and return parsed results."""
|
||||
cmd = [
|
||||
HRT_MODEL_EXEC, "perf",
|
||||
"--model_file", model_path,
|
||||
"--thread_num", str(thread_num),
|
||||
"--frame_count", str(frame_count),
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
output = result.stdout + result.stderr
|
||||
|
||||
perf = {
|
||||
"thread_num": thread_num,
|
||||
"frame_count": frame_count,
|
||||
"run_time_ms": None,
|
||||
"total_latency_ms": None,
|
||||
"avg_latency_ms": None,
|
||||
"fps": None,
|
||||
"raw_output": output,
|
||||
"returncode": result.returncode,
|
||||
}
|
||||
|
||||
m = re.search(r"Program run time:\s*([\d.]+)\s*ms", output)
|
||||
if m:
|
||||
perf["run_time_ms"] = float(m.group(1))
|
||||
|
||||
m = re.search(r"Frame totally latency is:\s*([\d.]+)\s*ms", output)
|
||||
if m:
|
||||
perf["total_latency_ms"] = float(m.group(1))
|
||||
|
||||
m = re.search(r"Average\s+latency\s+is:\s*([\d.]+)\s*ms", output)
|
||||
if m:
|
||||
perf["avg_latency_ms"] = float(m.group(1))
|
||||
|
||||
m = re.search(r"Frame\s+rate\s+is:\s*([\d.]+)\s*FPS", output)
|
||||
if m:
|
||||
perf["fps"] = float(m.group(1))
|
||||
|
||||
return perf
|
||||
|
||||
|
||||
def print_table(results: list):
|
||||
"""Print results as a human-readable table."""
|
||||
# 动态计算 model 列宽
|
||||
max_name_len = max(len(e["model_name"]) for e in results)
|
||||
col_widths = [max(max_name_len, 10), 9, 16, 10]
|
||||
headers = ["Model", "Threads", "Avg Latency(ms)", "FPS"]
|
||||
|
||||
sep = "+" + "+".join("-" * (w + 2) for w in col_widths) + "+"
|
||||
header_row = "|" + "|".join(
|
||||
f" {h:<{w}} " for h, w in zip(headers, col_widths)
|
||||
) + "|"
|
||||
|
||||
for entry in results:
|
||||
name = entry["model_name"]
|
||||
# 每个模型单独打印表头
|
||||
print(sep)
|
||||
print(header_row)
|
||||
print(sep)
|
||||
for p in entry["perf_results"]:
|
||||
avg = f"{p['avg_latency_ms']:.3f}" if p["avg_latency_ms"] is not None else "N/A"
|
||||
fps = f"{p['fps']:.2f}" if p["fps"] is not None else "N/A"
|
||||
row = [name, str(p["thread_num"]), avg, fps]
|
||||
print("|" + "|".join(
|
||||
f" {v:<{w}} " for v, w in zip(row, col_widths)
|
||||
) + "|")
|
||||
name = ""
|
||||
print(sep)
|
||||
|
||||
|
||||
def validate_config(config: dict, input_dir: Path) -> list:
|
||||
"""Print and validate task.json, return normalized model list. Exit on error."""
|
||||
print("=" * 60)
|
||||
print("task.json content:")
|
||||
print(json.dumps(config, indent=2, ensure_ascii=False))
|
||||
print("=" * 60)
|
||||
|
||||
errors = []
|
||||
|
||||
# --- frame_count ---
|
||||
if "frame_count" in config:
|
||||
if not isinstance(config["frame_count"], int) or config["frame_count"] <= 0:
|
||||
errors.append(" [frame_count] must be a positive integer")
|
||||
|
||||
# --- 判断格式 ---
|
||||
has_single = "model_relative_path" in config
|
||||
has_multi = "models" in config
|
||||
|
||||
if not has_single and not has_multi:
|
||||
errors.append(" missing required field: 'model_relative_path' or 'models'")
|
||||
for e in errors:
|
||||
print(f"[ERROR] {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if has_single and has_multi:
|
||||
errors.append(" ambiguous: both 'model_relative_path' and 'models' are present, use one")
|
||||
|
||||
models = []
|
||||
|
||||
if has_single:
|
||||
rel = config["model_relative_path"]
|
||||
if not isinstance(rel, str) or not rel.strip():
|
||||
errors.append(" [model_relative_path] must be a non-empty string")
|
||||
elif not rel.endswith((".hbm", ".bin")):
|
||||
errors.append(f" [model_relative_path] unsupported extension: '{rel}' (expected .hbm or .bin)")
|
||||
else:
|
||||
models = [{"name": Path(rel).name, "path": rel}]
|
||||
|
||||
if has_multi:
|
||||
if not isinstance(config["models"], list) or len(config["models"]) == 0:
|
||||
errors.append(" [models] must be a non-empty list")
|
||||
else:
|
||||
for i, m in enumerate(config["models"]):
|
||||
prefix = f" [models[{i}]]"
|
||||
if not isinstance(m, dict):
|
||||
errors.append(f"{prefix} each entry must be an object")
|
||||
continue
|
||||
if "path" not in m:
|
||||
errors.append(f"{prefix} missing required field 'path'")
|
||||
elif not isinstance(m["path"], str) or not m["path"].strip():
|
||||
errors.append(f"{prefix} 'path' must be a non-empty string")
|
||||
elif not m["path"].endswith((".hbm", ".bin")):
|
||||
errors.append(f"{prefix} unsupported extension: '{m['path']}' (expected .hbm or .bin)")
|
||||
else:
|
||||
models.append({"name": m.get("name", Path(m["path"]).name), "path": m["path"]})
|
||||
|
||||
if errors:
|
||||
for e in errors:
|
||||
print(f"[ERROR] {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# --- 文件存在性检查 ---
|
||||
missing = []
|
||||
for m in models:
|
||||
full = input_dir / m["path"]
|
||||
if not full.exists():
|
||||
missing.append(f" model file not found: {full}")
|
||||
if missing:
|
||||
for e in missing:
|
||||
print(f"[ERROR] {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"[OK] {len(models)} model(s) validated, frame_count={config.get('frame_count', 200)}\n")
|
||||
return models
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="BPU model perf benchmark")
|
||||
parser.add_argument("--input", required=True, help="Input JSON file")
|
||||
parser.add_argument("--output", required=True, help="Output JSON file")
|
||||
args = parser.parse_args()
|
||||
|
||||
if not Path(args.input).exists():
|
||||
print(f"[ERROR] input file not found: {args.input}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
with open(args.input) as f:
|
||||
config = json.load(f)
|
||||
|
||||
input_dir = Path(args.input).parent
|
||||
frame_count = config.get("frame_count", 200)
|
||||
models = validate_config(config, input_dir)
|
||||
|
||||
if not models:
|
||||
print("[ERROR] no models specified in input JSON", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
output_results = []
|
||||
|
||||
for model in models:
|
||||
rel_path = model["path"]
|
||||
model_path = str(input_dir / rel_path)
|
||||
model_name = model.get("name", Path(rel_path).name)
|
||||
print(f"\n[Benchmarking] {model_name} ({model_path})")
|
||||
|
||||
perf_results = []
|
||||
for t in THREAD_COUNTS:
|
||||
print(f" thread_num={t} ...", end=" ", flush=True)
|
||||
p = run_perf(model_path, t, frame_count)
|
||||
perf_results.append(p)
|
||||
if p["fps"] is not None:
|
||||
print(f"FPS={p['fps']:.2f} avg_latency={p['avg_latency_ms']:.3f}ms")
|
||||
else:
|
||||
print("FAILED (check raw_output in result JSON)")
|
||||
|
||||
output_results.append({
|
||||
"model_name": model_name,
|
||||
"model_path": model_path,
|
||||
"perf_results": perf_results,
|
||||
})
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print_table(output_results)
|
||||
|
||||
output_path = Path(args.output)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_path, "w") as f:
|
||||
json.dump(output_results, f, indent=2)
|
||||
|
||||
print(f"\nResults saved to: {args.output}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
x
Reference in New Issue
Block a user