feat(pipe): Add layout generation by text desc pipe and release v0.1.3. (#33)
* feat(pipe): Add layout generation pipe. * tag v0.1.3 * chore: update assets * update --------- Co-authored-by: xinjie.wang <xinjie.wang@gpu-4090-dev015.hogpu.cc>
This commit is contained in:
parent
e05095f51f
commit
e06900fe57
12
CHANGELOG.md
12
CHANGELOG.md
@ -4,8 +4,18 @@ All notable changes to this project will be documented in this file.
|
|||||||
|
|
||||||
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0).
|
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0).
|
||||||
|
|
||||||
|
## [0.1.3] - 2025-08-19
|
||||||
|
### Feature
|
||||||
|
- Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description.
|
||||||
|
|
||||||
## [0.1.1] - 2025-07-xx
|
|
||||||
|
## [0.1.2] - 2025-07-31
|
||||||
|
### Feature
|
||||||
|
- Release `scene3d-cli` 🌍 3D Scene Generation, generate a color mesh and 3DGS scene from text description.
|
||||||
|
- Release docker file and docker image.
|
||||||
|
|
||||||
|
|
||||||
|
## [0.1.1] - 2025-07-11
|
||||||
### Feature
|
### Feature
|
||||||
- Added intelligent quality checkers and auto-retry pipeline for `image-to-3d` and `text-to-3d`.
|
- Added intelligent quality checkers and auto-retry pipeline for `image-to-3d` and `text-to-3d`.
|
||||||
- Added unit tests for quality checkers.
|
- Added unit tests for quality checkers.
|
||||||
|
|||||||
45
README.md
45
README.md
@ -30,7 +30,7 @@
|
|||||||
```sh
|
```sh
|
||||||
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
|
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
|
||||||
cd EmbodiedGen
|
cd EmbodiedGen
|
||||||
git checkout v0.1.2
|
git checkout v0.1.3
|
||||||
git submodule update --init --recursive --progress
|
git submodule update --init --recursive --progress
|
||||||
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
|
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
|
||||||
conda activate embodiedgen
|
conda activate embodiedgen
|
||||||
@ -194,8 +194,6 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
|
|||||||
|
|
||||||
### 💬 Generate Layout from task description
|
### 💬 Generate Layout from task description
|
||||||
|
|
||||||
🚧 *Coming Soon*
|
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
<td><img src="apps/assets/layout1.gif" alt="layout1" width="360"/></td>
|
<td><img src="apps/assets/layout1.gif" alt="layout1" width="360"/></td>
|
||||||
@ -207,6 +205,47 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
Text-to-image model based on SD3.5 Medium, usage requires agreement to the [model license](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium). All models auto-downloaded at the first run.
|
||||||
|
|
||||||
|
You can generate any desired room as background using `scene3d-cli`. As each scene takes approximately 30 minutes to generate, we recommend pre-generating them for efficiency and adding them to `outputs/bg_scenes/scene_list.txt`.
|
||||||
|
|
||||||
|
We provided some sample background assets created with `scene3d-cli`. Download them(~4G) using `hf download xinjjj/scene3d-bg --repo-type dataset --local-dir outputs`.
|
||||||
|
|
||||||
|
Generating one interactive 3D scene from task description with `layout-cli` takes approximately 30 minutes.
|
||||||
|
```sh
|
||||||
|
layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on the table on the plate" \
|
||||||
|
--bg_list "outputs/bg_scenes/scene_list.txt" --output_root "outputs/layouts_gen" --insert_robot
|
||||||
|
```
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td><img src="apps/assets/Iscene_demo1.gif" alt="Iscene_demo1" width="300"/></td>
|
||||||
|
<td><img src="apps/assets/Iscene_demo2.gif" alt="Iscene_demo2" width="450"/></td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
Run multiple tasks defined in `task_list.txt` in the backend.
|
||||||
|
Remove `--insert_robot` if you don't consider the robot pose in layout generation.
|
||||||
|
```sh
|
||||||
|
CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
|
||||||
|
--task_descs "apps/assets/example_layout/task_list.txt" \
|
||||||
|
--bg_list "outputs/bg_scenes/scene_list.txt" \
|
||||||
|
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &
|
||||||
|
```
|
||||||
|
|
||||||
|
Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.)
|
||||||
|
```sh
|
||||||
|
python embodied_gen/scripts/compose_layout.py \
|
||||||
|
--layout_path "outputs/layouts_gens/task_0000/layout.json" \
|
||||||
|
--output_dir "outputs/layouts_gens/task_0000/recompose" --insert_robot
|
||||||
|
```
|
||||||
|
|
||||||
|
We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \
|
||||||
|
--output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka"
|
||||||
|
```
|
||||||
|
|
||||||
### 🖼️ Real-to-Sim Digital Twin
|
### 🖼️ Real-to-Sim Digital Twin
|
||||||
|
|
||||||
|
|||||||
BIN
apps/assets/Iscene_demo1.gif
Normal file
BIN
apps/assets/Iscene_demo1.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 471 KiB |
BIN
apps/assets/Iscene_demo2.gif
Normal file
BIN
apps/assets/Iscene_demo2.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 875 KiB |
18
apps/assets/example_layout/task_list.txt
Normal file
18
apps/assets/example_layout/task_list.txt
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
Place the pen in the mug on the desk
|
||||||
|
Get the pen from the table and put it inside the cup
|
||||||
|
Put the fruit on the table on the plate
|
||||||
|
Put the apples on the table on the plate
|
||||||
|
Put the spoon on the plate in the table
|
||||||
|
Put the green lid on the desk
|
||||||
|
Put the pen on the tray
|
||||||
|
Move the plate forward
|
||||||
|
Move the lamp to the left
|
||||||
|
Pick up the cup and put on the middle of the table
|
||||||
|
Close the laptop
|
||||||
|
Close the red jar with the lid on the table
|
||||||
|
Pick the cloth and shove it in the box
|
||||||
|
Pick up the marker from the table and put it in the bowl
|
||||||
|
Pick up the charger and move it slightly to the left
|
||||||
|
Move the jar to the left side of the desk
|
||||||
|
Pick the rubik's cube on the top of the desk
|
||||||
|
Move the mug to the right
|
||||||
@ -189,7 +189,7 @@ os.makedirs(TMP_DIR, exist_ok=True)
|
|||||||
lighting_css = """
|
lighting_css = """
|
||||||
<style>
|
<style>
|
||||||
#lighter_mesh canvas {
|
#lighter_mesh canvas {
|
||||||
filter: brightness(1.8) !important;
|
filter: brightness(1.9) !important;
|
||||||
}
|
}
|
||||||
</style>
|
</style>
|
||||||
"""
|
"""
|
||||||
|
|||||||
@ -51,17 +51,15 @@ class RenderResult:
|
|||||||
|
|
||||||
def __post_init__(self):
|
def __post_init__(self):
|
||||||
if isinstance(self.rgb, torch.Tensor):
|
if isinstance(self.rgb, torch.Tensor):
|
||||||
rgb = self.rgb.detach().cpu().numpy()
|
rgb = (self.rgb * 255).to(torch.uint8)
|
||||||
rgb = (rgb * 255).astype(np.uint8)
|
self.rgb = rgb.cpu().numpy()[..., ::-1]
|
||||||
self.rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)
|
|
||||||
if isinstance(self.depth, torch.Tensor):
|
if isinstance(self.depth, torch.Tensor):
|
||||||
self.depth = self.depth.detach().cpu().numpy()
|
self.depth = self.depth.cpu().numpy()
|
||||||
if isinstance(self.opacity, torch.Tensor):
|
if isinstance(self.opacity, torch.Tensor):
|
||||||
opacity = self.opacity.detach().cpu().numpy()
|
opacity = (self.opacity * 255).to(torch.uint8)
|
||||||
opacity = (opacity * 255).astype(np.uint8)
|
self.opacity = opacity.cpu().numpy()
|
||||||
self.opacity = cv2.cvtColor(opacity, cv2.COLOR_GRAY2RGB)
|
|
||||||
mask = np.where(self.opacity > self.mask_threshold, 255, 0)
|
mask = np.where(self.opacity > self.mask_threshold, 255, 0)
|
||||||
self.mask = mask[..., 0:1].astype(np.uint8)
|
self.mask = mask.astype(np.uint8)
|
||||||
self.rgba = np.concatenate([self.rgb, self.mask], axis=-1)
|
self.rgba = np.concatenate([self.rgb, self.mask], axis=-1)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
509
embodied_gen/models/layout.py
Normal file
509
embodied_gen/models/layout.py
Normal file
@ -0,0 +1,509 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
|
||||||
|
import json_repair
|
||||||
|
from embodied_gen.utils.enum import (
|
||||||
|
LayoutInfo,
|
||||||
|
RobotItemEnum,
|
||||||
|
Scene3DItemEnum,
|
||||||
|
SpatialRelationEnum,
|
||||||
|
)
|
||||||
|
from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient
|
||||||
|
from embodied_gen.utils.process_media import SceneTreeVisualizer
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"LayoutDesigner",
|
||||||
|
"LAYOUT_DISASSEMBLER",
|
||||||
|
"LAYOUT_GRAPHER",
|
||||||
|
"LAYOUT_DESCRIBER",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
DISTRACTOR_NUM = 3 # Maximum number of distractor objects allowed
|
||||||
|
LAYOUT_DISASSEMBLE_PROMPT = f"""
|
||||||
|
You are an intelligent 3D scene planner. Given a natural language
|
||||||
|
description of a robotic task, output a structured description of
|
||||||
|
an interactive 3D scene.
|
||||||
|
|
||||||
|
The output must include the following fields:
|
||||||
|
- task: A high-level task type (e.g., "single-arm pick",
|
||||||
|
"dual-arm grasping", "pick and place", "object sorting").
|
||||||
|
- {Scene3DItemEnum.ROBOT}: The name or type of robot involved. If not mentioned,
|
||||||
|
use {RobotItemEnum.FRANKA} as default.
|
||||||
|
- {Scene3DItemEnum.BACKGROUND}: The room or indoor environment where the task happens
|
||||||
|
(e.g., Kitchen, Bedroom, Living Room, Workshop, Office).
|
||||||
|
- {Scene3DItemEnum.CONTEXT}: A indoor object involved in the manipulation
|
||||||
|
(e.g., Table, Shelf, Desk, Bed, Cabinet).
|
||||||
|
- {Scene3DItemEnum.MANIPULATED_OBJS}: The main object(s) that the robot directly interacts with.
|
||||||
|
- {Scene3DItemEnum.DISTRACTOR_OBJS}: Other objects that naturally belong to the scene but are not part of the main task.
|
||||||
|
|
||||||
|
Constraints:
|
||||||
|
- The {Scene3DItemEnum.BACKGROUND} must logically match the described task.
|
||||||
|
- The {Scene3DItemEnum.CONTEXT} must fit within the {Scene3DItemEnum.BACKGROUND}. (e.g., a bedroom may include a table or bed, but not a workbench.)
|
||||||
|
- The {Scene3DItemEnum.CONTEXT} must be a concrete indoor object, such as a "table",
|
||||||
|
"shelf", "desk", or "bed". It must not be an abstract concept (e.g., "area", "space", "zone")
|
||||||
|
or structural surface (e.g., "floor", "ground"). If the input describes an interaction near
|
||||||
|
the floor or vague space, you must infer a plausible object like a "table", "cabinet", or "storage box" instead.
|
||||||
|
- {Scene3DItemEnum.MANIPULATED_OBJS} and {Scene3DItemEnum.DISTRACTOR_OBJS} objects must be plausible,
|
||||||
|
and semantically compatible with the {Scene3DItemEnum.CONTEXT} and {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
- {Scene3DItemEnum.DISTRACTOR_OBJS} must not confuse or overlap with the manipulated objects.
|
||||||
|
- {Scene3DItemEnum.DISTRACTOR_OBJS} number limit: {DISTRACTOR_NUM} distractors maximum.
|
||||||
|
- All {Scene3DItemEnum.BACKGROUND} are limited to indoor environments.
|
||||||
|
- {Scene3DItemEnum.MANIPULATED_OBJS} and {Scene3DItemEnum.DISTRACTOR_OBJS} are rigid bodies and not include flexible objects.
|
||||||
|
- {Scene3DItemEnum.MANIPULATED_OBJS} and {Scene3DItemEnum.DISTRACTOR_OBJS} must be common
|
||||||
|
household or office items or furniture, not abstract concepts, not too small like needle.
|
||||||
|
- If the input includes a plural or grouped object (e.g., "pens", "bottles", "plates", "fruit"),
|
||||||
|
you must decompose it into multiple individual instances (e.g., ["pen", "pen"], ["apple", "pear"]).
|
||||||
|
- Containers that hold objects (e.g., "bowl of apples", "box of tools") must
|
||||||
|
be separated into individual items (e.g., ["bowl", "apple", "apple"]).
|
||||||
|
- Do not include transparent objects such as "glass", "plastic", etc.
|
||||||
|
- The output must be in compact JSON format and use Markdown syntax, just like the output in the example below.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Pick up the marker from the table and put it in the bowl robot {RobotItemEnum.UR5}."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Pick up the marker from the table and put it in the bowl.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.UR5}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "kitchen",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "table",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["marker"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["mug", "notebook", "bowl"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Put the rubik's cube on the top of the shelf."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Put the rubik's cube on the top of the shelf.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.FRANKA}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "bedroom",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "shelf",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["rubik's cube"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["pen", "cup", "toy car"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Remove all the objects from the white basket and put them on the table."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Remove all the objects from the white basket and put them on the table, robot {RobotItemEnum.PIPER}.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.PIPER}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "office",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "table",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["banana", "mobile phone"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["plate", "white basket"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Pick up the rope on the chair and put it in the box."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Pick up the rope on the chair and put it in the box, robot {RobotItemEnum.FRANKA}.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.FRANKA}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "living room",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "chair",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["rope", "box"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["magazine"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Pick up the seal tape and plastic from the counter and put them in the open drawer and close it."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Pick up the seal tape and plastic from the counter and put them in the open drawer and close it.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"robot": "franka",
|
||||||
|
"background": "kitchen",
|
||||||
|
"context": "counter",
|
||||||
|
"manipulated_objs": ["seal tape", "plastic", "opened drawer"],
|
||||||
|
"distractor_objs": ["scissors"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"Put the pens in the grey bowl."
|
||||||
|
Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"task_desc": "Put the pens in the grey bowl.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"robot": "franka",
|
||||||
|
"background": "office",
|
||||||
|
"context": "table",
|
||||||
|
"manipulated_objs": ["pen", "pen", "grey bowl"],
|
||||||
|
"distractor_objs": ["notepad", "cup"]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
LAYOUT_HIERARCHY_PROMPT = f"""
|
||||||
|
You are a 3D scene layout reasoning expert.
|
||||||
|
Your task is to generate a spatial relationship dictionary in multiway tree
|
||||||
|
that describes how objects are arranged in a 3D environment
|
||||||
|
based on a given task description and object list.
|
||||||
|
|
||||||
|
Input in JSON format containing the task description, task type,
|
||||||
|
{Scene3DItemEnum.ROBOT}, {Scene3DItemEnum.BACKGROUND}, {Scene3DItemEnum.CONTEXT},
|
||||||
|
and a list of objects, including {Scene3DItemEnum.MANIPULATED_OBJS} and {Scene3DItemEnum.DISTRACTOR_OBJS}.
|
||||||
|
|
||||||
|
### Supported Spatial Relations:
|
||||||
|
- "{SpatialRelationEnum.ON}": The child object bottom is directly on top of the parent object top.
|
||||||
|
- "{SpatialRelationEnum.INSIDE}": The child object is inside the context object.
|
||||||
|
- "{SpatialRelationEnum.IN}": The {Scene3DItemEnum.ROBOT} in the {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
- "{SpatialRelationEnum.FLOOR}": The child object bottom is on the floor of the {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
|
||||||
|
### Rules:
|
||||||
|
- The {Scene3DItemEnum.CONTEXT} object must be "{SpatialRelationEnum.FLOOR}" the {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
- {Scene3DItemEnum.MANIPULATED_OBJS} and {Scene3DItemEnum.DISTRACTOR_OBJS} must be either
|
||||||
|
"{SpatialRelationEnum.ON}" or "{SpatialRelationEnum.INSIDE}" the {Scene3DItemEnum.CONTEXT}
|
||||||
|
- Or "{SpatialRelationEnum.FLOOR}" {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
- Use "{SpatialRelationEnum.INSIDE}" only if the parent is a container-like object (e.g., shelf, rack, cabinet).
|
||||||
|
- Do not define relationship edges between objects, only for the child and parent nodes.
|
||||||
|
- {Scene3DItemEnum.ROBOT} must "{SpatialRelationEnum.IN}" the {Scene3DItemEnum.BACKGROUND}.
|
||||||
|
- Ensure that each object appears only once in the layout tree, and its spatial relationship is defined with only one parent.
|
||||||
|
- Ensure a valid multiway tree structure with a maximum depth of 2 levels suitable for a 3D scene layout representation.
|
||||||
|
- Only output the final output in JSON format, using Markdown syntax as in examples.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
Input:
|
||||||
|
{{
|
||||||
|
"task_desc": "Pick up the marker from the table and put it in the bowl.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.FRANKA}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "kitchen",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "table",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["marker", "bowl"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["mug", "chair"]
|
||||||
|
}}
|
||||||
|
Intermediate Think:
|
||||||
|
table {SpatialRelationEnum.FLOOR} kitchen
|
||||||
|
chair {SpatialRelationEnum.FLOOR} kitchen
|
||||||
|
{RobotItemEnum.FRANKA} {SpatialRelationEnum.IN} kitchen
|
||||||
|
marker {SpatialRelationEnum.ON} table
|
||||||
|
bowl {SpatialRelationEnum.ON} table
|
||||||
|
mug {SpatialRelationEnum.ON} table
|
||||||
|
Final Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"kitchen": [
|
||||||
|
["table", "{SpatialRelationEnum.FLOOR}"],
|
||||||
|
["chair", "{SpatialRelationEnum.FLOOR}"],
|
||||||
|
["{RobotItemEnum.FRANKA}", "{SpatialRelationEnum.IN}"]
|
||||||
|
],
|
||||||
|
"table": [
|
||||||
|
["marker", "{SpatialRelationEnum.ON}"],
|
||||||
|
["bowl", "{SpatialRelationEnum.ON}"],
|
||||||
|
["mug", "{SpatialRelationEnum.ON}"]
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
{{
|
||||||
|
"task_desc": "Put the marker on top of the book.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.UR5}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "office",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "desk",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["marker", "book"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["pen holder", "notepad"]
|
||||||
|
}}
|
||||||
|
Intermediate Think:
|
||||||
|
desk {SpatialRelationEnum.FLOOR} office
|
||||||
|
{RobotItemEnum.UR5} {SpatialRelationEnum.IN} office
|
||||||
|
marker {SpatialRelationEnum.ON} desk
|
||||||
|
book {SpatialRelationEnum.ON} desk
|
||||||
|
pen holder {SpatialRelationEnum.ON} desk
|
||||||
|
notepad {SpatialRelationEnum.ON} desk
|
||||||
|
Final Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"office": [
|
||||||
|
["desk", "{SpatialRelationEnum.FLOOR}"],
|
||||||
|
["{RobotItemEnum.UR5}", "{SpatialRelationEnum.IN}"]
|
||||||
|
],
|
||||||
|
"desk": [
|
||||||
|
["marker", "{SpatialRelationEnum.ON}"],
|
||||||
|
["book", "{SpatialRelationEnum.ON}"],
|
||||||
|
["pen holder", "{SpatialRelationEnum.ON}"],
|
||||||
|
["notepad", "{SpatialRelationEnum.ON}"]
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
{{
|
||||||
|
"task_desc": "Put the rubik's cube on the top of the shelf.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"{Scene3DItemEnum.ROBOT}": "{RobotItemEnum.UR5}",
|
||||||
|
"{Scene3DItemEnum.BACKGROUND}": "bedroom",
|
||||||
|
"{Scene3DItemEnum.CONTEXT}": "shelf",
|
||||||
|
"{Scene3DItemEnum.MANIPULATED_OBJS}": ["rubik's cube"],
|
||||||
|
"{Scene3DItemEnum.DISTRACTOR_OBJS}": ["toy car", "pen"]
|
||||||
|
}}
|
||||||
|
Intermediate Think:
|
||||||
|
shelf {SpatialRelationEnum.FLOOR} bedroom
|
||||||
|
{RobotItemEnum.UR5} {SpatialRelationEnum.IN} bedroom
|
||||||
|
rubik's cube {SpatialRelationEnum.INSIDE} shelf
|
||||||
|
toy car {SpatialRelationEnum.INSIDE} shelf
|
||||||
|
pen {SpatialRelationEnum.INSIDE} shelf
|
||||||
|
Final Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"bedroom": [
|
||||||
|
["shelf", "{SpatialRelationEnum.FLOOR}"],
|
||||||
|
["{RobotItemEnum.UR5}", "{SpatialRelationEnum.IN}"]
|
||||||
|
],
|
||||||
|
"shelf": [
|
||||||
|
["rubik's cube", "{SpatialRelationEnum.INSIDE}"],
|
||||||
|
["toy car", "{SpatialRelationEnum.INSIDE}"],
|
||||||
|
["pen", "{SpatialRelationEnum.INSIDE}"]
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Input:
|
||||||
|
{{
|
||||||
|
"task_desc": "Put the marker in the cup on the counter.",
|
||||||
|
"task": "pick and place",
|
||||||
|
"robot": "franka",
|
||||||
|
"background": "kitchen",
|
||||||
|
"context": "counter",
|
||||||
|
"manipulated_objs": ["marker", "cup"],
|
||||||
|
"distractor_objs": ["plate", "spoon"]
|
||||||
|
}}
|
||||||
|
Intermediate Think:
|
||||||
|
counter {SpatialRelationEnum.FLOOR} kitchen
|
||||||
|
{RobotItemEnum.FRANKA} {SpatialRelationEnum.IN} kitchen
|
||||||
|
marker {SpatialRelationEnum.ON} counter
|
||||||
|
cup {SpatialRelationEnum.ON} counter
|
||||||
|
plate {SpatialRelationEnum.ON} counter
|
||||||
|
spoon {SpatialRelationEnum.ON} counter
|
||||||
|
Final Output:
|
||||||
|
```json
|
||||||
|
{{
|
||||||
|
"kitchen": [
|
||||||
|
["counter", "{SpatialRelationEnum.FLOOR}"],
|
||||||
|
["{RobotItemEnum.FRANKA}", "{SpatialRelationEnum.IN}"]
|
||||||
|
],
|
||||||
|
"counter": [
|
||||||
|
["marker", "{SpatialRelationEnum.ON}"],
|
||||||
|
["cup", "{SpatialRelationEnum.ON}"],
|
||||||
|
["plate", "{SpatialRelationEnum.ON}"],
|
||||||
|
["spoon", "{SpatialRelationEnum.ON}"]
|
||||||
|
]
|
||||||
|
}}
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
LAYOUT_DESCRIBER_PROMPT = """
|
||||||
|
You are a 3D asset style descriptor.
|
||||||
|
|
||||||
|
Given a task description and a dictionary where the key is the object content and
|
||||||
|
the value is the object type, output a JSON dictionary with each object paired
|
||||||
|
with a concise, styled visual description suitable for 3D asset generation.
|
||||||
|
|
||||||
|
Generation Guidelines:
|
||||||
|
- For each object, brainstorm multiple style candidates before selecting the final
|
||||||
|
description. Vary phrasing, material, texture, color, and spatial details.
|
||||||
|
- Each description must be a maximum of 15 words, including color, style, materials.
|
||||||
|
- Descriptions should be visually grounded, specific, and reflect surface texture and structure.
|
||||||
|
- For objects marked as "context", explicitly mention the object is standalone, has an empty top.
|
||||||
|
- Use rich style descriptors: e.g., "scratched brown wooden desk" etc.
|
||||||
|
- Ensure all object styles align with the task's overall context and environment.
|
||||||
|
|
||||||
|
Format your output in JSON like the example below.
|
||||||
|
|
||||||
|
Example Input:
|
||||||
|
"Pick up the rope on the chair and put it in the box. {'living room': 'background', 'chair': 'context',
|
||||||
|
'rope': 'manipulated_objs', 'box': 'manipulated_objs', 'magazine': 'distractor_objs'}"
|
||||||
|
|
||||||
|
Example Output:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"living room": "modern cozy living room with soft sunlight and light grey carpet",
|
||||||
|
"chair": "standalone dark oak chair with no surroundings and clean empty seat",
|
||||||
|
"rope": "twisted hemp rope with rough fibers and dusty beige texture",
|
||||||
|
"box": "slightly crumpled cardboard box with open flaps and brown textured surface",
|
||||||
|
"magazine": "celebrity magazine with glossy red cover and large bold title"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
class LayoutDesigner(object):
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
gpt_client: GPTclient,
|
||||||
|
system_prompt: str,
|
||||||
|
verbose: bool = False,
|
||||||
|
) -> None:
|
||||||
|
self.prompt = system_prompt.strip()
|
||||||
|
self.verbose = verbose
|
||||||
|
self.gpt_client = gpt_client
|
||||||
|
|
||||||
|
def query(self, prompt: str, params: dict = None) -> str:
|
||||||
|
full_prompt = self.prompt + f"\n\nInput:\n\"{prompt}\""
|
||||||
|
|
||||||
|
response = self.gpt_client.query(
|
||||||
|
text_prompt=full_prompt,
|
||||||
|
params=params,
|
||||||
|
)
|
||||||
|
|
||||||
|
if self.verbose:
|
||||||
|
logger.info(f"Response: {response}")
|
||||||
|
|
||||||
|
return response
|
||||||
|
|
||||||
|
def format_response(self, response: str) -> dict:
|
||||||
|
cleaned = re.sub(r"^```json\s*|\s*```$", "", response.strip())
|
||||||
|
try:
|
||||||
|
output = json.loads(cleaned)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
raise json.JSONDecodeError(
|
||||||
|
f"Error: {e}, failed to parse JSON response: {response}"
|
||||||
|
)
|
||||||
|
|
||||||
|
return output
|
||||||
|
|
||||||
|
def format_response_repair(self, response: str) -> dict:
|
||||||
|
return json_repair.loads(response)
|
||||||
|
|
||||||
|
def save_output(self, output: dict, save_path: str) -> None:
|
||||||
|
os.makedirs(os.path.dirname(save_path), exist_ok=True)
|
||||||
|
with open(save_path, 'w') as f:
|
||||||
|
json.dump(output, f, indent=4)
|
||||||
|
|
||||||
|
def __call__(
|
||||||
|
self, prompt: str, save_path: str = None, params: dict = None
|
||||||
|
) -> dict | str:
|
||||||
|
response = self.query(prompt, params=params)
|
||||||
|
output = self.format_response_repair(response)
|
||||||
|
self.save_output(output, save_path) if save_path else None
|
||||||
|
|
||||||
|
return output
|
||||||
|
|
||||||
|
|
||||||
|
LAYOUT_DISASSEMBLER = LayoutDesigner(
|
||||||
|
gpt_client=GPT_CLIENT, system_prompt=LAYOUT_DISASSEMBLE_PROMPT
|
||||||
|
)
|
||||||
|
LAYOUT_GRAPHER = LayoutDesigner(
|
||||||
|
gpt_client=GPT_CLIENT, system_prompt=LAYOUT_HIERARCHY_PROMPT
|
||||||
|
)
|
||||||
|
LAYOUT_DESCRIBER = LayoutDesigner(
|
||||||
|
gpt_client=GPT_CLIENT, system_prompt=LAYOUT_DESCRIBER_PROMPT
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def build_scene_layout(
|
||||||
|
task_desc: str, output_path: str = None, gpt_params: dict = None
|
||||||
|
) -> LayoutInfo:
|
||||||
|
layout_relation = LAYOUT_DISASSEMBLER(task_desc, params=gpt_params)
|
||||||
|
layout_tree = LAYOUT_GRAPHER(layout_relation, params=gpt_params)
|
||||||
|
object_mapping = Scene3DItemEnum.object_mapping(layout_relation)
|
||||||
|
obj_prompt = f'{layout_relation["task_desc"]} {object_mapping}'
|
||||||
|
objs_desc = LAYOUT_DESCRIBER(obj_prompt, params=gpt_params)
|
||||||
|
layout_info = LayoutInfo(
|
||||||
|
layout_tree, layout_relation, objs_desc, object_mapping
|
||||||
|
)
|
||||||
|
|
||||||
|
if output_path is not None:
|
||||||
|
visualizer = SceneTreeVisualizer(layout_info)
|
||||||
|
visualizer.render(save_path=output_path)
|
||||||
|
logger.info(f"Scene hierarchy tree saved to {output_path}")
|
||||||
|
|
||||||
|
return layout_info
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args():
|
||||||
|
parser = argparse.ArgumentParser(description="3D Scene Layout Designer")
|
||||||
|
parser.add_argument(
|
||||||
|
"--task_desc",
|
||||||
|
type=str,
|
||||||
|
default="Put the apples on the table on the plate",
|
||||||
|
help="Natural language description of the robotic task",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--save_root",
|
||||||
|
type=str,
|
||||||
|
default="outputs/layout_tree",
|
||||||
|
help="Path to save the layout output",
|
||||||
|
)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo
|
||||||
|
from embodied_gen.utils.process_media import SceneTreeVisualizer
|
||||||
|
|
||||||
|
args = parse_args()
|
||||||
|
params = {
|
||||||
|
"temperature": 1.0,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"frequency_penalty": 0.3,
|
||||||
|
"presence_penalty": 0.5,
|
||||||
|
}
|
||||||
|
layout_relation = LAYOUT_DISASSEMBLER(args.task_desc, params=params)
|
||||||
|
layout_tree = LAYOUT_GRAPHER(layout_relation, params=params)
|
||||||
|
|
||||||
|
object_mapping = Scene3DItemEnum.object_mapping(layout_relation)
|
||||||
|
obj_prompt = f'{layout_relation["task_desc"]} {object_mapping}'
|
||||||
|
|
||||||
|
objs_desc = LAYOUT_DESCRIBER(obj_prompt, params=params)
|
||||||
|
|
||||||
|
layout_info = LayoutInfo(layout_tree, layout_relation, objs_desc)
|
||||||
|
|
||||||
|
visualizer = SceneTreeVisualizer(layout_info)
|
||||||
|
os.makedirs(args.save_root, exist_ok=True)
|
||||||
|
scene_graph_path = f"{args.save_root}/scene_tree.jpg"
|
||||||
|
visualizer.render(save_path=scene_graph_path)
|
||||||
|
with open(f"{args.save_root}/layout.json", "w") as f:
|
||||||
|
json.dump(layout_info.to_dict(), f, indent=4)
|
||||||
|
|
||||||
|
print(f"Scene hierarchy tree saved to {scene_graph_path}")
|
||||||
|
print(f"Disassembled Layout: {layout_relation}")
|
||||||
|
print(f"Layout Graph: {layout_tree}")
|
||||||
|
print(f"Layout Descriptions: {objs_desc}")
|
||||||
73
embodied_gen/scripts/compose_layout.py
Normal file
73
embodied_gen/scripts/compose_layout.py
Normal file
@ -0,0 +1,73 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
import tyro
|
||||||
|
from embodied_gen.scripts.simulate_sapien import entrypoint as sim_cli
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo
|
||||||
|
from embodied_gen.utils.geometry import bfs_placement, compose_mesh_scene
|
||||||
|
from embodied_gen.utils.log import logger
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LayoutPlacementConfig:
|
||||||
|
layout_path: str
|
||||||
|
output_dir: str | None = None
|
||||||
|
seed: int | None = None
|
||||||
|
max_attempts: int = 1000
|
||||||
|
output_iscene: bool = False
|
||||||
|
insert_robot: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
def entrypoint(**kwargs):
|
||||||
|
if kwargs is None or len(kwargs) == 0:
|
||||||
|
args = tyro.cli(LayoutPlacementConfig)
|
||||||
|
else:
|
||||||
|
args = LayoutPlacementConfig(**kwargs)
|
||||||
|
|
||||||
|
output_dir = (
|
||||||
|
args.output_dir
|
||||||
|
if args.output_dir is not None
|
||||||
|
else os.path.dirname(args.layout_path)
|
||||||
|
)
|
||||||
|
os.makedirs(output_dir, exist_ok=True)
|
||||||
|
out_scene_path = f"{output_dir}/Iscene.glb"
|
||||||
|
out_layout_path = f"{output_dir}/layout.json"
|
||||||
|
|
||||||
|
with open(args.layout_path, "r") as f:
|
||||||
|
layout_info = LayoutInfo.from_dict(json.load(f))
|
||||||
|
|
||||||
|
layout_info = bfs_placement(layout_info, seed=args.seed)
|
||||||
|
with open(out_layout_path, "w") as f:
|
||||||
|
json.dump(layout_info.to_dict(), f, indent=4)
|
||||||
|
|
||||||
|
if args.output_iscene:
|
||||||
|
compose_mesh_scene(layout_info, out_scene_path)
|
||||||
|
|
||||||
|
sim_cli(
|
||||||
|
layout_path=out_layout_path,
|
||||||
|
output_dir=output_dir,
|
||||||
|
robot_name="franka" if args.insert_robot else None,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"Layout placement completed in {output_dir}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
entrypoint()
|
||||||
156
embodied_gen/scripts/gen_layout.py
Normal file
156
embodied_gen/scripts/gen_layout.py
Normal file
@ -0,0 +1,156 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
import gc
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from shutil import copytree
|
||||||
|
from time import time
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import torch
|
||||||
|
import tyro
|
||||||
|
from embodied_gen.models.layout import build_scene_layout
|
||||||
|
from embodied_gen.scripts.simulate_sapien import entrypoint as sim_cli
|
||||||
|
from embodied_gen.scripts.textto3d import text_to_3d
|
||||||
|
from embodied_gen.utils.config import GptParamsConfig
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||||
|
from embodied_gen.utils.geometry import bfs_placement, compose_mesh_scene
|
||||||
|
from embodied_gen.utils.gpt_clients import GPT_CLIENT
|
||||||
|
from embodied_gen.utils.log import logger
|
||||||
|
from embodied_gen.utils.process_media import (
|
||||||
|
load_scene_dict,
|
||||||
|
parse_text_prompts,
|
||||||
|
)
|
||||||
|
from embodied_gen.validators.quality_checkers import SemanticMatcher
|
||||||
|
|
||||||
|
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LayoutGenConfig:
|
||||||
|
task_descs: list[str]
|
||||||
|
output_root: str
|
||||||
|
bg_list: str = "outputs/bg_scenes/scene_list.txt"
|
||||||
|
n_img_sample: int = 3
|
||||||
|
text_guidance_scale: float = 7.0
|
||||||
|
img_denoise_step: int = 25
|
||||||
|
n_image_retry: int = 4
|
||||||
|
n_asset_retry: int = 3
|
||||||
|
n_pipe_retry: int = 2
|
||||||
|
seed_img: Optional[int] = None
|
||||||
|
seed_3d: Optional[int] = None
|
||||||
|
seed_layout: Optional[int] = None
|
||||||
|
keep_intermediate: bool = False
|
||||||
|
output_iscene: bool = False
|
||||||
|
insert_robot: bool = False
|
||||||
|
gpt_params: GptParamsConfig = field(
|
||||||
|
default_factory=lambda: GptParamsConfig(
|
||||||
|
temperature=1.0,
|
||||||
|
top_p=0.95,
|
||||||
|
frequency_penalty=0.3,
|
||||||
|
presence_penalty=0.5,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def entrypoint() -> None:
|
||||||
|
args = tyro.cli(LayoutGenConfig)
|
||||||
|
SCENE_MATCHER = SemanticMatcher(GPT_CLIENT)
|
||||||
|
task_descs = parse_text_prompts(args.task_descs)
|
||||||
|
scene_dict = load_scene_dict(args.bg_list)
|
||||||
|
gpt_params = args.gpt_params.to_dict()
|
||||||
|
for idx, task_desc in enumerate(task_descs):
|
||||||
|
logger.info(f"Generate Layout and 3D scene for task: {task_desc}")
|
||||||
|
output_root = f"{args.output_root}/task_{idx:04d}"
|
||||||
|
scene_graph_path = f"{output_root}/scene_tree.jpg"
|
||||||
|
start_time = time()
|
||||||
|
layout_info: LayoutInfo = build_scene_layout(
|
||||||
|
task_desc, scene_graph_path, gpt_params
|
||||||
|
)
|
||||||
|
prompts_mapping = {v: k for k, v in layout_info.objs_desc.items()}
|
||||||
|
prompts = [
|
||||||
|
v
|
||||||
|
for k, v in layout_info.objs_desc.items()
|
||||||
|
if layout_info.objs_mapping[k] != Scene3DItemEnum.BACKGROUND.value
|
||||||
|
]
|
||||||
|
|
||||||
|
for prompt in prompts:
|
||||||
|
node = prompts_mapping[prompt]
|
||||||
|
generation_log = text_to_3d(
|
||||||
|
prompts=[
|
||||||
|
prompt,
|
||||||
|
],
|
||||||
|
output_root=output_root,
|
||||||
|
asset_names=[
|
||||||
|
node,
|
||||||
|
],
|
||||||
|
n_img_sample=args.n_img_sample,
|
||||||
|
text_guidance_scale=args.text_guidance_scale,
|
||||||
|
img_denoise_step=args.img_denoise_step,
|
||||||
|
n_image_retry=args.n_image_retry,
|
||||||
|
n_asset_retry=args.n_asset_retry,
|
||||||
|
n_pipe_retry=args.n_pipe_retry,
|
||||||
|
seed_img=args.seed_img,
|
||||||
|
seed_3d=args.seed_3d,
|
||||||
|
keep_intermediate=args.keep_intermediate,
|
||||||
|
)
|
||||||
|
layout_info.assets.update(generation_log["assets"])
|
||||||
|
layout_info.quality.update(generation_log["quality"])
|
||||||
|
|
||||||
|
# Background GEN (for efficiency, temp use retrieval instead)
|
||||||
|
bg_node = layout_info.relation[Scene3DItemEnum.BACKGROUND.value]
|
||||||
|
text = layout_info.objs_desc[bg_node]
|
||||||
|
match_key = SCENE_MATCHER.query(text, str(scene_dict))
|
||||||
|
match_scene_path = f"{os.path.dirname(args.bg_list)}/{match_key}"
|
||||||
|
bg_save_dir = os.path.join(output_root, "background")
|
||||||
|
copytree(match_scene_path, bg_save_dir, dirs_exist_ok=True)
|
||||||
|
layout_info.assets[bg_node] = bg_save_dir
|
||||||
|
|
||||||
|
# BFS layout placement.
|
||||||
|
layout_info = bfs_placement(
|
||||||
|
layout_info,
|
||||||
|
limit_reach_range=True if args.insert_robot else False,
|
||||||
|
seed=args.seed_layout,
|
||||||
|
)
|
||||||
|
layout_path = f"{output_root}/layout.json"
|
||||||
|
with open(layout_path, "w") as f:
|
||||||
|
json.dump(layout_info.to_dict(), f, indent=4)
|
||||||
|
|
||||||
|
if args.output_iscene:
|
||||||
|
compose_mesh_scene(layout_info, f"{output_root}/Iscene.glb")
|
||||||
|
|
||||||
|
sim_cli(
|
||||||
|
layout_path=layout_path,
|
||||||
|
output_dir=output_root,
|
||||||
|
robot_name="franka" if args.insert_robot else None,
|
||||||
|
)
|
||||||
|
|
||||||
|
torch.cuda.empty_cache()
|
||||||
|
gc.collect()
|
||||||
|
|
||||||
|
elapsed_time = (time() - start_time) / 60
|
||||||
|
logger.info(
|
||||||
|
f"Layout generation done for {scene_graph_path}, layout result "
|
||||||
|
f"in {layout_path}, finished in {elapsed_time:.2f} mins."
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"All tasks completed in {args.output_root}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
entrypoint()
|
||||||
@ -151,6 +151,9 @@ def entrypoint(**kwargs):
|
|||||||
seg_image.save(seg_path)
|
seg_image.save(seg_path)
|
||||||
|
|
||||||
seed = args.seed
|
seed = args.seed
|
||||||
|
asset_node = "unknown"
|
||||||
|
if isinstance(args.asset_type, list) and args.asset_type[idx]:
|
||||||
|
asset_node = args.asset_type[idx]
|
||||||
for try_idx in range(args.n_retry):
|
for try_idx in range(args.n_retry):
|
||||||
logger.info(
|
logger.info(
|
||||||
f"Try: {try_idx + 1}/{args.n_retry}, Seed: {seed}, Prompt: {seg_path}"
|
f"Try: {try_idx + 1}/{args.n_retry}, Seed: {seed}, Prompt: {seg_path}"
|
||||||
@ -207,7 +210,9 @@ def entrypoint(**kwargs):
|
|||||||
color_path = os.path.join(output_root, "color.png")
|
color_path = os.path.join(output_root, "color.png")
|
||||||
render_gs_api(aligned_gs_path, color_path)
|
render_gs_api(aligned_gs_path, color_path)
|
||||||
|
|
||||||
geo_flag, geo_result = GEO_CHECKER([color_path])
|
geo_flag, geo_result = GEO_CHECKER(
|
||||||
|
[color_path], text=asset_node
|
||||||
|
)
|
||||||
logger.warning(
|
logger.warning(
|
||||||
f"{GEO_CHECKER.__class__.__name__}: {geo_result} for {seg_path}"
|
f"{GEO_CHECKER.__class__.__name__}: {geo_result} for {seg_path}"
|
||||||
)
|
)
|
||||||
|
|||||||
186
embodied_gen/scripts/simulate_sapien.py
Normal file
186
embodied_gen/scripts/simulate_sapien.py
Normal file
@ -0,0 +1,186 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from collections import defaultdict
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import imageio
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
import tyro
|
||||||
|
from tqdm import tqdm
|
||||||
|
from embodied_gen.models.gs_model import GaussianOperator
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||||
|
from embodied_gen.utils.geometry import quaternion_multiply
|
||||||
|
from embodied_gen.utils.log import logger
|
||||||
|
from embodied_gen.utils.process_media import alpha_blend_rgba
|
||||||
|
from embodied_gen.utils.simulation import (
|
||||||
|
SIM_COORD_ALIGN,
|
||||||
|
FrankaPandaGrasper,
|
||||||
|
SapienSceneManager,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SapienSimConfig:
|
||||||
|
# Simulation settings.
|
||||||
|
layout_path: str
|
||||||
|
output_dir: str
|
||||||
|
sim_freq: int = 200
|
||||||
|
sim_step: int = 400
|
||||||
|
z_offset: float = 0.004
|
||||||
|
init_quat: list[float] = field(
|
||||||
|
default_factory=lambda: [0.7071, 0, 0, 0.7071]
|
||||||
|
) # xyzw
|
||||||
|
device: str = "cuda"
|
||||||
|
robot_name: str | None = None
|
||||||
|
control_freq: int = 40
|
||||||
|
# Camera settings.
|
||||||
|
render_interval: int = 10
|
||||||
|
num_cameras: int = 3
|
||||||
|
camera_radius: float = 0.9
|
||||||
|
camera_height: float = 1.1
|
||||||
|
image_hw: tuple[int, int] = (512, 512)
|
||||||
|
ray_tracing: bool = True
|
||||||
|
fovy_deg: float = 75.0
|
||||||
|
camera_target_pt: list[float] = field(
|
||||||
|
default_factory=lambda: [0.0, 0.0, 0.9]
|
||||||
|
)
|
||||||
|
render_keys: list[
|
||||||
|
Literal[
|
||||||
|
"Color", "Foreground", "Segmentation", "Normal", "Mask", "Depth"
|
||||||
|
]
|
||||||
|
] = field(default_factory=lambda: ["Foreground"])
|
||||||
|
|
||||||
|
|
||||||
|
def entrypoint(**kwargs):
|
||||||
|
if kwargs is None or len(kwargs) == 0:
|
||||||
|
cfg = tyro.cli(SapienSimConfig)
|
||||||
|
else:
|
||||||
|
cfg = SapienSimConfig(**kwargs)
|
||||||
|
|
||||||
|
scene_manager = SapienSceneManager(
|
||||||
|
cfg.sim_freq, ray_tracing=cfg.ray_tracing
|
||||||
|
)
|
||||||
|
_ = scene_manager.initialize_circular_cameras(
|
||||||
|
num_cameras=cfg.num_cameras,
|
||||||
|
radius=cfg.camera_radius,
|
||||||
|
height=cfg.camera_height,
|
||||||
|
target_pt=cfg.camera_target_pt,
|
||||||
|
image_hw=cfg.image_hw,
|
||||||
|
fovy_deg=cfg.fovy_deg,
|
||||||
|
)
|
||||||
|
with open(cfg.layout_path, "r") as f:
|
||||||
|
layout_data = json.load(f)
|
||||||
|
layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data)
|
||||||
|
|
||||||
|
scene_manager.load_assets_from_layout_file(
|
||||||
|
layout_data,
|
||||||
|
cfg.z_offset,
|
||||||
|
cfg.init_quat,
|
||||||
|
cfg.robot_name,
|
||||||
|
cfg.control_freq,
|
||||||
|
)
|
||||||
|
|
||||||
|
frames = defaultdict(list)
|
||||||
|
image_cnt = 0
|
||||||
|
for step in tqdm(range(cfg.sim_step), desc="Simulation"):
|
||||||
|
scene_manager.scene.step()
|
||||||
|
if cfg.robot_name is not None:
|
||||||
|
scene_manager.robot.reset(scene_manager.robot.init_qpos)
|
||||||
|
if step % cfg.render_interval != 0:
|
||||||
|
continue
|
||||||
|
scene_manager.scene.update_render()
|
||||||
|
image_cnt += 1
|
||||||
|
for camera in scene_manager.cameras:
|
||||||
|
camera.take_picture()
|
||||||
|
images = scene_manager.render_images(
|
||||||
|
camera, render_keys=cfg.render_keys
|
||||||
|
)
|
||||||
|
frames[camera.name].append(images)
|
||||||
|
|
||||||
|
grasp_frames = defaultdict(dict)
|
||||||
|
if cfg.robot_name is not None:
|
||||||
|
sim_steps_per_control = cfg.sim_freq // cfg.control_freq
|
||||||
|
control_timestep = 1.0 / cfg.control_freq
|
||||||
|
grasper = FrankaPandaGrasper(
|
||||||
|
scene_manager.robot,
|
||||||
|
scene_manager,
|
||||||
|
sim_steps_per_control,
|
||||||
|
control_timestep,
|
||||||
|
)
|
||||||
|
for node in layout_data.relation[
|
||||||
|
Scene3DItemEnum.MANIPULATED_OBJS.value
|
||||||
|
]:
|
||||||
|
grasp_frames[node] = grasper.render_grasp(
|
||||||
|
scene_manager.actors[node],
|
||||||
|
scene_manager.cameras,
|
||||||
|
cfg.render_keys,
|
||||||
|
)
|
||||||
|
|
||||||
|
if "Foreground" not in cfg.render_keys:
|
||||||
|
return
|
||||||
|
|
||||||
|
bg_node = layout_data.relation[Scene3DItemEnum.BACKGROUND.value]
|
||||||
|
gs_path = f"{layout_data.assets[bg_node]}/gs_model.ply"
|
||||||
|
gs_model: GaussianOperator = GaussianOperator.load_from_ply(gs_path)
|
||||||
|
x, y, z, qx, qy, qz, qw = layout_data.position[bg_node]
|
||||||
|
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], cfg.init_quat)
|
||||||
|
init_pose = torch.tensor([x, y, z, qx, qy, qz, qw])
|
||||||
|
gs_model = gs_model.get_gaussians(instance_pose=init_pose)
|
||||||
|
|
||||||
|
bg_images = dict()
|
||||||
|
for camera in scene_manager.cameras:
|
||||||
|
Ks = camera.get_intrinsic_matrix()
|
||||||
|
c2w = camera.get_model_matrix()
|
||||||
|
c2w = c2w @ SIM_COORD_ALIGN
|
||||||
|
result = gs_model.render(
|
||||||
|
torch.tensor(c2w, dtype=torch.float32).to(cfg.device),
|
||||||
|
torch.tensor(Ks, dtype=torch.float32).to(cfg.device),
|
||||||
|
image_width=cfg.image_hw[1],
|
||||||
|
image_height=cfg.image_hw[0],
|
||||||
|
)
|
||||||
|
bg_images[camera.name] = result.rgb[..., ::-1]
|
||||||
|
|
||||||
|
video_frames = []
|
||||||
|
for camera in scene_manager.cameras:
|
||||||
|
# Scene rendering
|
||||||
|
for step in tqdm(range(image_cnt), desc="Rendering"):
|
||||||
|
rgba = alpha_blend_rgba(
|
||||||
|
frames[camera.name][step]["Foreground"],
|
||||||
|
bg_images[camera.name],
|
||||||
|
)
|
||||||
|
video_frames.append(np.array(rgba))
|
||||||
|
# Grasp rendering
|
||||||
|
for node in grasp_frames:
|
||||||
|
for frame in grasp_frames[node][camera.name]:
|
||||||
|
rgba = alpha_blend_rgba(
|
||||||
|
frame["Foreground"], bg_images[camera.name]
|
||||||
|
)
|
||||||
|
video_frames.append(np.array(rgba))
|
||||||
|
|
||||||
|
os.makedirs(cfg.output_dir, exist_ok=True)
|
||||||
|
video_path = f"{cfg.output_dir}/Iscene.mp4"
|
||||||
|
imageio.mimsave(video_path, video_frames, fps=30)
|
||||||
|
logger.info(f"Interative 3D Scene Visualization saved in {video_path}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
entrypoint()
|
||||||
@ -81,6 +81,7 @@ done
|
|||||||
|
|
||||||
|
|
||||||
# Step 1: Text-to-Image
|
# Step 1: Text-to-Image
|
||||||
|
echo ${prompt_args}
|
||||||
eval python3 embodied_gen/scripts/text2image.py \
|
eval python3 embodied_gen/scripts/text2image.py \
|
||||||
--prompts ${prompt_args} \
|
--prompts ${prompt_args} \
|
||||||
--output_root "${output_root}/images" \
|
--output_root "${output_root}/images" \
|
||||||
|
|||||||
@ -617,7 +617,7 @@ class Runner:
|
|||||||
for rgb, depth in images_cache:
|
for rgb, depth in images_cache:
|
||||||
depth_normalized = torch.clip(
|
depth_normalized = torch.clip(
|
||||||
(depth - depth_global_min)
|
(depth - depth_global_min)
|
||||||
/ (depth_global_max - depth_global_min),
|
/ (depth_global_max - depth_global_min + 1e-8),
|
||||||
0,
|
0,
|
||||||
1,
|
1,
|
||||||
)
|
)
|
||||||
|
|||||||
@ -30,7 +30,7 @@ from kornia.morphology import dilation
|
|||||||
from PIL import Image
|
from PIL import Image
|
||||||
from embodied_gen.models.sr_model import ImageRealESRGAN
|
from embodied_gen.models.sr_model import ImageRealESRGAN
|
||||||
from embodied_gen.utils.config import Pano2MeshSRConfig
|
from embodied_gen.utils.config import Pano2MeshSRConfig
|
||||||
from embodied_gen.utils.gaussian import compute_pinhole_intrinsics
|
from embodied_gen.utils.geometry import compute_pinhole_intrinsics
|
||||||
from embodied_gen.utils.log import logger
|
from embodied_gen.utils.log import logger
|
||||||
from thirdparty.pano2room.modules.geo_predictors import PanoJointPredictor
|
from thirdparty.pano2room.modules.geo_predictors import PanoJointPredictor
|
||||||
from thirdparty.pano2room.modules.geo_predictors.PanoFusionDistancePredictor import (
|
from thirdparty.pano2room.modules.geo_predictors.PanoFusionDistancePredictor import (
|
||||||
|
|||||||
@ -17,15 +17,27 @@
|
|||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from typing import List, Optional, Union
|
from typing import List, Optional, Union
|
||||||
|
|
||||||
|
from dataclasses_json import DataClassJsonMixin
|
||||||
from gsplat.strategy import DefaultStrategy, MCMCStrategy
|
from gsplat.strategy import DefaultStrategy, MCMCStrategy
|
||||||
from typing_extensions import Literal, assert_never
|
from typing_extensions import Literal, assert_never
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
|
"GptParamsConfig",
|
||||||
"Pano2MeshSRConfig",
|
"Pano2MeshSRConfig",
|
||||||
"GsplatTrainConfig",
|
"GsplatTrainConfig",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GptParamsConfig(DataClassJsonMixin):
|
||||||
|
temperature: float = 0.1
|
||||||
|
top_p: float = 0.1
|
||||||
|
frequency_penalty: float = 0.0
|
||||||
|
presence_penalty: float = 0.0
|
||||||
|
stop: int | None = None
|
||||||
|
max_tokens: int = 500
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
class Pano2MeshSRConfig:
|
class Pano2MeshSRConfig:
|
||||||
mesh_file: str = "mesh_model.ply"
|
mesh_file: str = "mesh_model.ply"
|
||||||
|
|||||||
@ -102,6 +102,7 @@ class LayoutInfo(DataClassJsonMixin):
|
|||||||
tree: dict[str, list]
|
tree: dict[str, list]
|
||||||
relation: dict[str, str | list[str]]
|
relation: dict[str, str | list[str]]
|
||||||
objs_desc: dict[str, str] = field(default_factory=dict)
|
objs_desc: dict[str, str] = field(default_factory=dict)
|
||||||
|
objs_mapping: dict[str, str] = field(default_factory=dict)
|
||||||
assets: dict[str, str] = field(default_factory=dict)
|
assets: dict[str, str] = field(default_factory=dict)
|
||||||
quality: dict[str, str] = field(default_factory=dict)
|
quality: dict[str, str] = field(default_factory=dict)
|
||||||
position: dict[str, list[float]] = field(default_factory=dict)
|
position: dict[str, list[float]] = field(default_factory=dict)
|
||||||
|
|||||||
@ -35,7 +35,6 @@ __all__ = [
|
|||||||
"set_random_seed",
|
"set_random_seed",
|
||||||
"export_splats",
|
"export_splats",
|
||||||
"create_splats_with_optimizers",
|
"create_splats_with_optimizers",
|
||||||
"compute_pinhole_intrinsics",
|
|
||||||
"resize_pinhole_intrinsics",
|
"resize_pinhole_intrinsics",
|
||||||
"restore_scene_scale_and_position",
|
"restore_scene_scale_and_position",
|
||||||
]
|
]
|
||||||
@ -265,12 +264,12 @@ def create_splats_with_optimizers(
|
|||||||
return splats, optimizers
|
return splats, optimizers
|
||||||
|
|
||||||
|
|
||||||
def compute_pinhole_intrinsics(
|
def compute_intrinsics_from_fovy(
|
||||||
image_w: int, image_h: int, fov_deg: float
|
image_w: int, image_h: int, fovy_deg: float
|
||||||
) -> np.ndarray:
|
) -> np.ndarray:
|
||||||
fov_rad = np.deg2rad(fov_deg)
|
fovy_rad = np.deg2rad(fovy_deg)
|
||||||
fx = image_w / (2 * np.tan(fov_rad / 2))
|
fy = image_h / (2 * np.tan(fovy_rad / 2))
|
||||||
fy = fx # assuming square pixels
|
fx = fy * (image_w / image_h)
|
||||||
cx = image_w / 2
|
cx = image_w / 2
|
||||||
cy = image_h / 2
|
cy = image_h / 2
|
||||||
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
|
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
|
||||||
|
|||||||
458
embodied_gen/utils/geometry.py
Normal file
458
embodied_gen/utils/geometry.py
Normal file
@ -0,0 +1,458 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
from collections import defaultdict, deque
|
||||||
|
from functools import wraps
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
import trimesh
|
||||||
|
from matplotlib.path import Path
|
||||||
|
from pyquaternion import Quaternion
|
||||||
|
from scipy.spatial import ConvexHull
|
||||||
|
from scipy.spatial.transform import Rotation as R
|
||||||
|
from shapely.geometry import Polygon
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||||
|
from embodied_gen.utils.log import logger
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"bfs_placement",
|
||||||
|
"with_seed",
|
||||||
|
"matrix_to_pose",
|
||||||
|
"pose_to_matrix",
|
||||||
|
"quaternion_multiply",
|
||||||
|
"check_reachable",
|
||||||
|
"bfs_placement",
|
||||||
|
"compose_mesh_scene",
|
||||||
|
"compute_pinhole_intrinsics",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def matrix_to_pose(matrix: np.ndarray) -> list[float]:
|
||||||
|
"""Convert a 4x4 transformation matrix to a pose (x, y, z, qx, qy, qz, qw).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
matrix (np.ndarray): 4x4 transformation matrix.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[float]: Pose as [x, y, z, qx, qy, qz, qw].
|
||||||
|
"""
|
||||||
|
x, y, z = matrix[:3, 3]
|
||||||
|
rot_mat = matrix[:3, :3]
|
||||||
|
quat = R.from_matrix(rot_mat).as_quat()
|
||||||
|
qx, qy, qz, qw = quat
|
||||||
|
|
||||||
|
return [x, y, z, qx, qy, qz, qw]
|
||||||
|
|
||||||
|
|
||||||
|
def pose_to_matrix(pose: list[float]) -> np.ndarray:
|
||||||
|
"""Convert pose (x, y, z, qx, qy, qz, qw) to a 4x4 transformation matrix.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
List[float]: Pose as [x, y, z, qx, qy, qz, qw].
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
matrix (np.ndarray): 4x4 transformation matrix.
|
||||||
|
"""
|
||||||
|
x, y, z, qx, qy, qz, qw = pose
|
||||||
|
r = R.from_quat([qx, qy, qz, qw])
|
||||||
|
matrix = np.eye(4)
|
||||||
|
matrix[:3, :3] = r.as_matrix()
|
||||||
|
matrix[:3, 3] = [x, y, z]
|
||||||
|
|
||||||
|
return matrix
|
||||||
|
|
||||||
|
|
||||||
|
def compute_xy_bbox(
|
||||||
|
vertices: np.ndarray, col_x: int = 0, col_y: int = 2
|
||||||
|
) -> list[float]:
|
||||||
|
x_vals = vertices[:, col_x]
|
||||||
|
y_vals = vertices[:, col_y]
|
||||||
|
return x_vals.min(), x_vals.max(), y_vals.min(), y_vals.max()
|
||||||
|
|
||||||
|
|
||||||
|
def has_iou_conflict(
|
||||||
|
new_box: list[float],
|
||||||
|
placed_boxes: list[list[float]],
|
||||||
|
iou_threshold: float = 0.0,
|
||||||
|
) -> bool:
|
||||||
|
new_min_x, new_max_x, new_min_y, new_max_y = new_box
|
||||||
|
for min_x, max_x, min_y, max_y in placed_boxes:
|
||||||
|
ix1 = max(new_min_x, min_x)
|
||||||
|
iy1 = max(new_min_y, min_y)
|
||||||
|
ix2 = min(new_max_x, max_x)
|
||||||
|
iy2 = min(new_max_y, max_y)
|
||||||
|
inter_area = max(0, ix2 - ix1) * max(0, iy2 - iy1)
|
||||||
|
if inter_area > iou_threshold:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def with_seed(seed_attr_name: str = "seed"):
|
||||||
|
"""A parameterized decorator that temporarily sets the random seed."""
|
||||||
|
|
||||||
|
def decorator(func):
|
||||||
|
@wraps(func)
|
||||||
|
def wrapper(*args, **kwargs):
|
||||||
|
seed = kwargs.get(seed_attr_name, None)
|
||||||
|
if seed is not None:
|
||||||
|
py_state = random.getstate()
|
||||||
|
np_state = np.random.get_state()
|
||||||
|
torch_state = torch.get_rng_state()
|
||||||
|
|
||||||
|
random.seed(seed)
|
||||||
|
np.random.seed(seed)
|
||||||
|
torch.manual_seed(seed)
|
||||||
|
try:
|
||||||
|
result = func(*args, **kwargs)
|
||||||
|
finally:
|
||||||
|
random.setstate(py_state)
|
||||||
|
np.random.set_state(np_state)
|
||||||
|
torch.set_rng_state(torch_state)
|
||||||
|
return result
|
||||||
|
else:
|
||||||
|
return func(*args, **kwargs)
|
||||||
|
|
||||||
|
return wrapper
|
||||||
|
|
||||||
|
return decorator
|
||||||
|
|
||||||
|
|
||||||
|
def compute_convex_hull_path(
|
||||||
|
vertices: np.ndarray,
|
||||||
|
z_threshold: float = 0.05,
|
||||||
|
interp_per_edge: int = 3,
|
||||||
|
margin: float = -0.02,
|
||||||
|
) -> Path:
|
||||||
|
top_vertices = vertices[
|
||||||
|
vertices[:, 1] > vertices[:, 1].max() - z_threshold
|
||||||
|
]
|
||||||
|
top_xy = top_vertices[:, [0, 2]]
|
||||||
|
|
||||||
|
if len(top_xy) < 3:
|
||||||
|
raise ValueError("Not enough points to form a convex hull")
|
||||||
|
|
||||||
|
hull = ConvexHull(top_xy)
|
||||||
|
hull_points = top_xy[hull.vertices]
|
||||||
|
|
||||||
|
polygon = Polygon(hull_points)
|
||||||
|
polygon = polygon.buffer(margin)
|
||||||
|
hull_points = np.array(polygon.exterior.coords)
|
||||||
|
|
||||||
|
dense_points = []
|
||||||
|
for i in range(len(hull_points)):
|
||||||
|
p1 = hull_points[i]
|
||||||
|
p2 = hull_points[(i + 1) % len(hull_points)]
|
||||||
|
for t in np.linspace(0, 1, interp_per_edge, endpoint=False):
|
||||||
|
pt = (1 - t) * p1 + t * p2
|
||||||
|
dense_points.append(pt)
|
||||||
|
|
||||||
|
return Path(np.array(dense_points), closed=True)
|
||||||
|
|
||||||
|
|
||||||
|
def find_parent_node(node: str, tree: dict) -> str | None:
|
||||||
|
for parent, children in tree.items():
|
||||||
|
if any(child[0] == node for child in children):
|
||||||
|
return parent
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def all_corners_inside(hull: Path, box: list, threshold: int = 3) -> bool:
|
||||||
|
x1, x2, y1, y2 = box
|
||||||
|
corners = [[x1, y1], [x2, y1], [x1, y2], [x2, y2]]
|
||||||
|
|
||||||
|
num_inside = sum(hull.contains_point(c) for c in corners)
|
||||||
|
return num_inside >= threshold
|
||||||
|
|
||||||
|
|
||||||
|
def compute_axis_rotation_quat(
|
||||||
|
axis: Literal["x", "y", "z"], angle_rad: float
|
||||||
|
) -> list[float]:
|
||||||
|
if axis.lower() == 'x':
|
||||||
|
q = Quaternion(axis=[1, 0, 0], angle=angle_rad)
|
||||||
|
elif axis.lower() == 'y':
|
||||||
|
q = Quaternion(axis=[0, 1, 0], angle=angle_rad)
|
||||||
|
elif axis.lower() == 'z':
|
||||||
|
q = Quaternion(axis=[0, 0, 1], angle=angle_rad)
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unsupported axis '{axis}', must be one of x, y, z")
|
||||||
|
|
||||||
|
return [q.x, q.y, q.z, q.w]
|
||||||
|
|
||||||
|
|
||||||
|
def quaternion_multiply(
|
||||||
|
init_quat: list[float], rotate_quat: list[float]
|
||||||
|
) -> list[float]:
|
||||||
|
qx, qy, qz, qw = init_quat
|
||||||
|
q1 = Quaternion(w=qw, x=qx, y=qy, z=qz)
|
||||||
|
qx, qy, qz, qw = rotate_quat
|
||||||
|
q2 = Quaternion(w=qw, x=qx, y=qy, z=qz)
|
||||||
|
quat = q2 * q1
|
||||||
|
|
||||||
|
return [quat.x, quat.y, quat.z, quat.w]
|
||||||
|
|
||||||
|
|
||||||
|
def check_reachable(
|
||||||
|
base_xyz: np.ndarray,
|
||||||
|
reach_xyz: np.ndarray,
|
||||||
|
min_reach: float = 0.25,
|
||||||
|
max_reach: float = 0.85,
|
||||||
|
) -> bool:
|
||||||
|
"""Check if the target point is within the reachable range."""
|
||||||
|
distance = np.linalg.norm(reach_xyz - base_xyz)
|
||||||
|
|
||||||
|
return min_reach < distance < max_reach
|
||||||
|
|
||||||
|
|
||||||
|
@with_seed("seed")
|
||||||
|
def bfs_placement(
|
||||||
|
layout_info: LayoutInfo,
|
||||||
|
floor_margin: float = 0,
|
||||||
|
beside_margin: float = 0.1,
|
||||||
|
max_attempts: int = 3000,
|
||||||
|
rotate_objs: bool = True,
|
||||||
|
rotate_bg: bool = True,
|
||||||
|
limit_reach_range: bool = True,
|
||||||
|
robot_dim: float = 0.12,
|
||||||
|
seed: int = None,
|
||||||
|
) -> LayoutInfo:
|
||||||
|
object_mapping = layout_info.objs_mapping
|
||||||
|
position = {} # node: [x, y, z, qx, qy, qz, qw]
|
||||||
|
parent_bbox_xy = {}
|
||||||
|
placed_boxes_map = defaultdict(list)
|
||||||
|
mesh_info = defaultdict(dict)
|
||||||
|
robot_node = layout_info.relation[Scene3DItemEnum.ROBOT.value]
|
||||||
|
for node in object_mapping:
|
||||||
|
if object_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||||
|
bg_quat = (
|
||||||
|
compute_axis_rotation_quat(
|
||||||
|
axis="y",
|
||||||
|
angle_rad=np.random.uniform(0, 2 * np.pi),
|
||||||
|
)
|
||||||
|
if rotate_bg
|
||||||
|
else [0, 0, 0, 1]
|
||||||
|
)
|
||||||
|
bg_quat = [round(q, 4) for q in bg_quat]
|
||||||
|
continue
|
||||||
|
|
||||||
|
mesh_path = (
|
||||||
|
f"{layout_info.assets[node]}/mesh/{node.replace(' ', '_')}.obj"
|
||||||
|
)
|
||||||
|
mesh_info[node]["path"] = mesh_path
|
||||||
|
mesh = trimesh.load(mesh_path)
|
||||||
|
vertices = mesh.vertices
|
||||||
|
z1 = np.percentile(vertices[:, 1], 1)
|
||||||
|
z2 = np.percentile(vertices[:, 1], 99)
|
||||||
|
|
||||||
|
if object_mapping[node] == Scene3DItemEnum.CONTEXT.value:
|
||||||
|
object_quat = [0, 0, 0, 1]
|
||||||
|
mesh_info[node]["surface"] = compute_convex_hull_path(vertices)
|
||||||
|
# Put robot in the CONTEXT edge.
|
||||||
|
x, y = random.choice(mesh_info[node]["surface"].vertices)
|
||||||
|
theta = np.arctan2(y, x)
|
||||||
|
quat_initial = Quaternion(axis=[0, 0, 1], angle=theta)
|
||||||
|
quat_extra = Quaternion(axis=[0, 0, 1], angle=np.pi)
|
||||||
|
quat = quat_extra * quat_initial
|
||||||
|
_pose = [x, y, z2 - z1, quat.x, quat.y, quat.z, quat.w]
|
||||||
|
position[robot_node] = [round(v, 4) for v in _pose]
|
||||||
|
node_box = [
|
||||||
|
x - robot_dim / 2,
|
||||||
|
x + robot_dim / 2,
|
||||||
|
y - robot_dim / 2,
|
||||||
|
y + robot_dim / 2,
|
||||||
|
]
|
||||||
|
placed_boxes_map[node].append(node_box)
|
||||||
|
elif rotate_objs:
|
||||||
|
# For manipulated and distractor objects, apply random rotation
|
||||||
|
angle_rad = np.random.uniform(0, 2 * np.pi)
|
||||||
|
object_quat = compute_axis_rotation_quat(
|
||||||
|
axis="y", angle_rad=angle_rad
|
||||||
|
)
|
||||||
|
object_quat_scipy = np.roll(object_quat, 1) # [w, x, y, z]
|
||||||
|
rotation = R.from_quat(object_quat_scipy).as_matrix()
|
||||||
|
vertices = np.dot(mesh.vertices, rotation.T)
|
||||||
|
z1 = np.percentile(vertices[:, 1], 1)
|
||||||
|
z2 = np.percentile(vertices[:, 1], 99)
|
||||||
|
|
||||||
|
x1, x2, y1, y2 = compute_xy_bbox(vertices)
|
||||||
|
mesh_info[node]["pose"] = [x1, x2, y1, y2, z1, z2, *object_quat]
|
||||||
|
mesh_info[node]["area"] = max(1e-5, (x2 - x1) * (y2 - y1))
|
||||||
|
|
||||||
|
root = list(layout_info.tree.keys())[0]
|
||||||
|
queue = deque([((root, None), layout_info.tree.get(root, []))])
|
||||||
|
while queue:
|
||||||
|
(node, relation), children = queue.popleft()
|
||||||
|
if node not in object_mapping:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if object_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||||
|
position[node] = [0, 0, floor_margin, *bg_quat]
|
||||||
|
else:
|
||||||
|
x1, x2, y1, y2, z1, z2, qx, qy, qz, qw = mesh_info[node]["pose"]
|
||||||
|
if object_mapping[node] == Scene3DItemEnum.CONTEXT.value:
|
||||||
|
position[node] = [0, 0, -round(z1, 4), qx, qy, qz, qw]
|
||||||
|
parent_bbox_xy[node] = [x1, x2, y1, y2, z1, z2]
|
||||||
|
elif object_mapping[node] in [
|
||||||
|
Scene3DItemEnum.MANIPULATED_OBJS.value,
|
||||||
|
Scene3DItemEnum.DISTRACTOR_OBJS.value,
|
||||||
|
]:
|
||||||
|
parent_node = find_parent_node(node, layout_info.tree)
|
||||||
|
parent_pos = position[parent_node]
|
||||||
|
(
|
||||||
|
p_x1,
|
||||||
|
p_x2,
|
||||||
|
p_y1,
|
||||||
|
p_y2,
|
||||||
|
p_z1,
|
||||||
|
p_z2,
|
||||||
|
) = parent_bbox_xy[parent_node]
|
||||||
|
|
||||||
|
obj_dx = x2 - x1
|
||||||
|
obj_dy = y2 - y1
|
||||||
|
hull_path = mesh_info[parent_node].get("surface")
|
||||||
|
for _ in range(max_attempts):
|
||||||
|
node_x1 = random.uniform(p_x1, p_x2 - obj_dx)
|
||||||
|
node_y1 = random.uniform(p_y1, p_y2 - obj_dy)
|
||||||
|
node_box = [
|
||||||
|
node_x1,
|
||||||
|
node_x1 + obj_dx,
|
||||||
|
node_y1,
|
||||||
|
node_y1 + obj_dy,
|
||||||
|
]
|
||||||
|
if hull_path and not all_corners_inside(
|
||||||
|
hull_path, node_box
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
# Make sure the manipulated object is reachable by robot.
|
||||||
|
if (
|
||||||
|
limit_reach_range
|
||||||
|
and object_mapping[node]
|
||||||
|
== Scene3DItemEnum.MANIPULATED_OBJS.value
|
||||||
|
):
|
||||||
|
cx = parent_pos[0] + node_box[0] + obj_dx / 2
|
||||||
|
cy = parent_pos[1] + node_box[2] + obj_dy / 2
|
||||||
|
cz = parent_pos[2] + p_z2 - z1
|
||||||
|
robot_pose = position[robot_node][:3]
|
||||||
|
if not check_reachable(
|
||||||
|
base_xyz=np.array(robot_pose),
|
||||||
|
reach_xyz=np.array([cx, cy, cz]),
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not has_iou_conflict(
|
||||||
|
node_box, placed_boxes_map[parent_node]
|
||||||
|
):
|
||||||
|
z_offset = 0
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
logger.warning(
|
||||||
|
f"Cannot place {node} on {parent_node} without overlap"
|
||||||
|
f" after {max_attempts} attempts, place beside {parent_node}."
|
||||||
|
)
|
||||||
|
for _ in range(max_attempts):
|
||||||
|
node_x1 = random.choice(
|
||||||
|
[
|
||||||
|
random.uniform(
|
||||||
|
p_x1 - obj_dx - beside_margin,
|
||||||
|
p_x1 - obj_dx,
|
||||||
|
),
|
||||||
|
random.uniform(p_x2, p_x2 + beside_margin),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
node_y1 = random.choice(
|
||||||
|
[
|
||||||
|
random.uniform(
|
||||||
|
p_y1 - obj_dy - beside_margin,
|
||||||
|
p_y1 - obj_dy,
|
||||||
|
),
|
||||||
|
random.uniform(p_y2, p_y2 + beside_margin),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
node_box = [
|
||||||
|
node_x1,
|
||||||
|
node_x1 + obj_dx,
|
||||||
|
node_y1,
|
||||||
|
node_y1 + obj_dy,
|
||||||
|
]
|
||||||
|
z_offset = -(parent_pos[2] + p_z2)
|
||||||
|
if not has_iou_conflict(
|
||||||
|
node_box, placed_boxes_map[parent_node]
|
||||||
|
):
|
||||||
|
break
|
||||||
|
|
||||||
|
placed_boxes_map[parent_node].append(node_box)
|
||||||
|
|
||||||
|
abs_cx = parent_pos[0] + node_box[0] + obj_dx / 2
|
||||||
|
abs_cy = parent_pos[1] + node_box[2] + obj_dy / 2
|
||||||
|
abs_cz = parent_pos[2] + p_z2 - z1 + z_offset
|
||||||
|
position[node] = [
|
||||||
|
round(v, 4)
|
||||||
|
for v in [abs_cx, abs_cy, abs_cz, qx, qy, qz, qw]
|
||||||
|
]
|
||||||
|
parent_bbox_xy[node] = [x1, x2, y1, y2, z1, z2]
|
||||||
|
|
||||||
|
sorted_children = sorted(
|
||||||
|
children, key=lambda x: -mesh_info[x[0]].get("area", 0)
|
||||||
|
)
|
||||||
|
for child, rel in sorted_children:
|
||||||
|
queue.append(((child, rel), layout_info.tree.get(child, [])))
|
||||||
|
|
||||||
|
layout_info.position = position
|
||||||
|
|
||||||
|
return layout_info
|
||||||
|
|
||||||
|
|
||||||
|
def compose_mesh_scene(
|
||||||
|
layout_info: LayoutInfo, out_scene_path: str, with_bg: bool = False
|
||||||
|
) -> None:
|
||||||
|
object_mapping = Scene3DItemEnum.object_mapping(layout_info.relation)
|
||||||
|
scene = trimesh.Scene()
|
||||||
|
for node in layout_info.assets:
|
||||||
|
if object_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||||
|
mesh_path = f"{layout_info.assets[node]}/mesh_model.ply"
|
||||||
|
if not with_bg:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
mesh_path = (
|
||||||
|
f"{layout_info.assets[node]}/mesh/{node.replace(' ', '_')}.obj"
|
||||||
|
)
|
||||||
|
|
||||||
|
mesh = trimesh.load(mesh_path)
|
||||||
|
offset = np.array(layout_info.position[node])[[0, 2, 1]]
|
||||||
|
mesh.vertices += offset
|
||||||
|
scene.add_geometry(mesh, node_name=node)
|
||||||
|
|
||||||
|
os.makedirs(os.path.dirname(out_scene_path), exist_ok=True)
|
||||||
|
scene.export(out_scene_path)
|
||||||
|
logger.info(f"Composed interactive 3D layout saved in {out_scene_path}")
|
||||||
|
|
||||||
|
return
|
||||||
|
|
||||||
|
|
||||||
|
def compute_pinhole_intrinsics(
|
||||||
|
image_w: int, image_h: int, fov_deg: float
|
||||||
|
) -> np.ndarray:
|
||||||
|
fov_rad = np.deg2rad(fov_deg)
|
||||||
|
fx = image_w / (2 * np.tan(fov_rad / 2))
|
||||||
|
fy = fx # assuming square pixels
|
||||||
|
cx = image_w / 2
|
||||||
|
cy = image_h / 2
|
||||||
|
K = np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]])
|
||||||
|
|
||||||
|
return K
|
||||||
@ -166,7 +166,7 @@ def combine_images_to_grid(
|
|||||||
images: list[str | Image.Image],
|
images: list[str | Image.Image],
|
||||||
cat_row_col: tuple[int, int] = None,
|
cat_row_col: tuple[int, int] = None,
|
||||||
target_wh: tuple[int, int] = (512, 512),
|
target_wh: tuple[int, int] = (512, 512),
|
||||||
) -> list[str | Image.Image]:
|
) -> list[Image.Image]:
|
||||||
n_images = len(images)
|
n_images = len(images)
|
||||||
if n_images == 1:
|
if n_images == 1:
|
||||||
return images
|
return images
|
||||||
@ -377,6 +377,42 @@ def parse_text_prompts(prompts: list[str]) -> list[str]:
|
|||||||
return prompts
|
return prompts
|
||||||
|
|
||||||
|
|
||||||
|
def alpha_blend_rgba(
|
||||||
|
fg_image: Union[str, Image.Image, np.ndarray],
|
||||||
|
bg_image: Union[str, Image.Image, np.ndarray],
|
||||||
|
) -> Image.Image:
|
||||||
|
"""Alpha blends a foreground RGBA image over a background RGBA image.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
fg_image: Foreground image. Can be a file path (str), a PIL Image,
|
||||||
|
or a NumPy ndarray.
|
||||||
|
bg_image: Background image. Can be a file path (str), a PIL Image,
|
||||||
|
or a NumPy ndarray.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
A PIL Image representing the alpha-blended result in RGBA mode.
|
||||||
|
"""
|
||||||
|
if isinstance(fg_image, str):
|
||||||
|
fg_image = Image.open(fg_image)
|
||||||
|
elif isinstance(fg_image, np.ndarray):
|
||||||
|
fg_image = Image.fromarray(fg_image)
|
||||||
|
|
||||||
|
if isinstance(bg_image, str):
|
||||||
|
bg_image = Image.open(bg_image)
|
||||||
|
elif isinstance(bg_image, np.ndarray):
|
||||||
|
bg_image = Image.fromarray(bg_image)
|
||||||
|
|
||||||
|
if fg_image.size != bg_image.size:
|
||||||
|
raise ValueError(
|
||||||
|
f"Image sizes not match {fg_image.size} v.s. {bg_image.size}."
|
||||||
|
)
|
||||||
|
|
||||||
|
fg = fg_image.convert("RGBA")
|
||||||
|
bg = bg_image.convert("RGBA")
|
||||||
|
|
||||||
|
return Image.alpha_composite(bg, fg)
|
||||||
|
|
||||||
|
|
||||||
def check_object_edge_truncated(
|
def check_object_edge_truncated(
|
||||||
mask: np.ndarray, edge_threshold: int = 5
|
mask: np.ndarray, edge_threshold: int = 5
|
||||||
) -> bool:
|
) -> bool:
|
||||||
@ -400,8 +436,15 @@ def check_object_edge_truncated(
|
|||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
merge_video_video(
|
image_paths = [
|
||||||
"outputs/imageto3d/room_bottle7/room_bottle_007/URDF_room_bottle_007/mesh_glo_normal.mp4", # noqa
|
"outputs/layouts_sim/task_0000/images/pen.png",
|
||||||
"outputs/imageto3d/room_bottle7/room_bottle_007/URDF_room_bottle_007/mesh.mp4", # noqa
|
"outputs/layouts_sim/task_0000/images/notebook.png",
|
||||||
"merge.mp4",
|
"outputs/layouts_sim/task_0000/images/mug.png",
|
||||||
)
|
"outputs/layouts_sim/task_0000/images/lamp.png",
|
||||||
|
"outputs/layouts_sim2/task_0014/images/cloth.png", # TODO
|
||||||
|
]
|
||||||
|
for image_path in image_paths:
|
||||||
|
image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)
|
||||||
|
mask = image[..., -1]
|
||||||
|
flag = check_object_edge_truncated(mask)
|
||||||
|
print(flag, image_path)
|
||||||
|
|||||||
706
embodied_gen/utils/simulation.py
Normal file
706
embodied_gen/utils/simulation.py
Normal file
@ -0,0 +1,706 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
from collections import defaultdict
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
import mplib
|
||||||
|
import numpy as np
|
||||||
|
import sapien.core as sapien
|
||||||
|
import torch
|
||||||
|
from PIL import Image, ImageColor
|
||||||
|
from scipy.spatial.transform import Rotation as R
|
||||||
|
from tqdm import tqdm
|
||||||
|
from embodied_gen.data.utils import DiffrastRender
|
||||||
|
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||||
|
from embodied_gen.utils.geometry import quaternion_multiply
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
from mani_skill.examples.motionplanning.panda.utils import (
|
||||||
|
compute_grasp_info_by_obb,
|
||||||
|
)
|
||||||
|
from mani_skill.utils.geometry.trimesh_utils import get_component_mesh
|
||||||
|
|
||||||
|
COLORMAP = list(set(ImageColor.colormap.values()))
|
||||||
|
COLOR_PALETTE = np.array(
|
||||||
|
[ImageColor.getrgb(c) for c in COLORMAP], dtype=np.uint8
|
||||||
|
)
|
||||||
|
SIM_COORD_ALIGN = np.array(
|
||||||
|
[
|
||||||
|
[1.0, 0.0, 0.0, 0.0],
|
||||||
|
[0.0, -1.0, 0.0, 0.0],
|
||||||
|
[0.0, 0.0, -1.0, 0.0],
|
||||||
|
[0.0, 0.0, 0.0, 1.0],
|
||||||
|
]
|
||||||
|
) # Used to align SAPIEN, MuJoCo coordinate system with the world coordinate system
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"SIM_COORD_ALIGN",
|
||||||
|
"FrankaPandaGrasper",
|
||||||
|
"SimpleGrasper",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
class SapienSceneManager:
|
||||||
|
"""A class to manage SAPIEN simulator."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self, sim_freq: int, ray_tracing: bool, device: str = "cuda"
|
||||||
|
) -> None:
|
||||||
|
self.sim_freq = sim_freq
|
||||||
|
self.ray_tracing = ray_tracing
|
||||||
|
self.device = device
|
||||||
|
self.renderer = sapien.SapienRenderer()
|
||||||
|
self.scene = self._setup_scene()
|
||||||
|
self.cameras: list[sapien.render.RenderCameraComponent] = []
|
||||||
|
self.actors: dict[str, sapien.pysapien.Entity] = {}
|
||||||
|
|
||||||
|
def _setup_scene(self) -> sapien.Scene:
|
||||||
|
"""Set up the SAPIEN scene with lighting and ground."""
|
||||||
|
# Ray tracing settings
|
||||||
|
if self.ray_tracing:
|
||||||
|
sapien.render.set_camera_shader_dir("rt")
|
||||||
|
sapien.render.set_ray_tracing_samples_per_pixel(64)
|
||||||
|
sapien.render.set_ray_tracing_path_depth(10)
|
||||||
|
sapien.render.set_ray_tracing_denoiser("oidn")
|
||||||
|
|
||||||
|
# TODO: Support GPU.
|
||||||
|
# import sapien.physx as physx
|
||||||
|
# physx.enable_gpu()
|
||||||
|
# scene = sapien.Scene(systems=[sapien.physx.PhysxGpuSystem(), sapien.render.RenderSystem()])
|
||||||
|
|
||||||
|
scene = sapien.Scene()
|
||||||
|
scene.set_timestep(1 / self.sim_freq)
|
||||||
|
|
||||||
|
# Add lighting
|
||||||
|
scene.set_ambient_light([0.2, 0.2, 0.2])
|
||||||
|
scene.add_directional_light(
|
||||||
|
direction=[0, 1, -1],
|
||||||
|
color=[1.5, 1.45, 1.4],
|
||||||
|
shadow=True,
|
||||||
|
shadow_map_size=2048,
|
||||||
|
)
|
||||||
|
scene.add_directional_light(
|
||||||
|
direction=[0, -0.5, 1], color=[0.8, 0.8, 0.85], shadow=False
|
||||||
|
)
|
||||||
|
scene.add_directional_light(
|
||||||
|
direction=[0, -1, 1], color=[1.0, 1.0, 1.0], shadow=False
|
||||||
|
)
|
||||||
|
|
||||||
|
ground_material = self.renderer.create_material()
|
||||||
|
ground_material.base_color = [0.5, 0.5, 0.5, 1] # rgba, gray
|
||||||
|
ground_material.roughness = 0.7
|
||||||
|
ground_material.metallic = 0.0
|
||||||
|
scene.add_ground(0, render_material=ground_material)
|
||||||
|
|
||||||
|
return scene
|
||||||
|
|
||||||
|
def step_action(
|
||||||
|
self,
|
||||||
|
action: torch.Tensor,
|
||||||
|
sim_steps_per_control: int,
|
||||||
|
cameras: list[sapien.render.RenderCameraComponent],
|
||||||
|
render_keys: list[str],
|
||||||
|
) -> dict:
|
||||||
|
self.robot.set_action(action)
|
||||||
|
frames = defaultdict(list)
|
||||||
|
for _ in range(sim_steps_per_control):
|
||||||
|
self.scene.step()
|
||||||
|
|
||||||
|
self.scene.update_render()
|
||||||
|
for camera in cameras:
|
||||||
|
camera.take_picture()
|
||||||
|
images = self.render_images(camera, render_keys=render_keys)
|
||||||
|
frames[camera.name].append(images)
|
||||||
|
|
||||||
|
return frames
|
||||||
|
|
||||||
|
def load_exists_robot(
|
||||||
|
self,
|
||||||
|
robot_name: str,
|
||||||
|
pose: sapien.Pose,
|
||||||
|
control_freq: int = 20,
|
||||||
|
robot_init_qpos_noise: float = 0.02,
|
||||||
|
control_mode: str = "pd_joint_pos",
|
||||||
|
backend_str: tuple[str, str] = ("cpu", "gpu"),
|
||||||
|
):
|
||||||
|
from mani_skill.agents import REGISTERED_AGENTS
|
||||||
|
from mani_skill.envs.scene import ManiSkillScene
|
||||||
|
from mani_skill.envs.utils.system.backend import (
|
||||||
|
parse_sim_and_render_backend,
|
||||||
|
)
|
||||||
|
|
||||||
|
if robot_name == "franka":
|
||||||
|
robot_name = "panda" # remapping
|
||||||
|
if robot_name not in REGISTERED_AGENTS:
|
||||||
|
raise KeyError(
|
||||||
|
f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}."
|
||||||
|
)
|
||||||
|
|
||||||
|
ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
|
||||||
|
# backend = parse_sim_and_render_backend("physx_cuda", "gpu")
|
||||||
|
backend = parse_sim_and_render_backend(*backend_str)
|
||||||
|
scene = ManiSkillScene(
|
||||||
|
[self.scene], device=backend_str[0], backend=backend
|
||||||
|
)
|
||||||
|
robot = ROBOT_CLS(
|
||||||
|
scene=scene,
|
||||||
|
control_freq=control_freq,
|
||||||
|
control_mode=control_mode,
|
||||||
|
initial_pose=pose,
|
||||||
|
)
|
||||||
|
# Set robot init joint rad agree(joint0 to joint6 w 2 finger).
|
||||||
|
qpos = np.array(
|
||||||
|
[
|
||||||
|
0.0,
|
||||||
|
np.pi / 8,
|
||||||
|
0,
|
||||||
|
-np.pi * 3 / 8,
|
||||||
|
0,
|
||||||
|
np.pi * 3 / 4,
|
||||||
|
np.pi / 4,
|
||||||
|
0.04,
|
||||||
|
0.04,
|
||||||
|
]
|
||||||
|
)
|
||||||
|
qpos = (
|
||||||
|
np.random.normal(
|
||||||
|
0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
|
||||||
|
)
|
||||||
|
+ qpos
|
||||||
|
)
|
||||||
|
qpos[:, -2:] = 0.04
|
||||||
|
robot.reset(qpos)
|
||||||
|
robot.init_qpos = robot.robot.qpos
|
||||||
|
robot.controller.controllers["gripper"].reset()
|
||||||
|
|
||||||
|
return robot
|
||||||
|
|
||||||
|
def load_actor_from_urdf(
|
||||||
|
self,
|
||||||
|
file_path: str,
|
||||||
|
pose: sapien.Pose,
|
||||||
|
use_static: bool = False,
|
||||||
|
update_mass: bool = False,
|
||||||
|
) -> sapien.pysapien.Entity:
|
||||||
|
"""Load an actor from a URDF file into the scene.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_path (str): Path to the URDF file.
|
||||||
|
pose (sapien.Pose): Pose of the actor.
|
||||||
|
use_static (bool): Whether to create a static actor.
|
||||||
|
update_mass (bool): Whether to update the actor's mass from the URDF.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
sapien.pysapien.Entity: The loaded actor.
|
||||||
|
"""
|
||||||
|
tree = ET.parse(file_path)
|
||||||
|
root = tree.getroot()
|
||||||
|
node_name = root.get("name")
|
||||||
|
file_dir = os.path.dirname(file_path)
|
||||||
|
visual_file = root.find('.//visual/geometry/mesh').get("filename")
|
||||||
|
collision_file = root.find('.//collision/geometry/mesh').get(
|
||||||
|
"filename"
|
||||||
|
)
|
||||||
|
visual_file = os.path.join(file_dir, visual_file)
|
||||||
|
collision_file = os.path.join(file_dir, collision_file)
|
||||||
|
static_fric = root.find('.//collision/gazebo/mu1').text
|
||||||
|
dynamic_fric = root.find('.//collision/gazebo/mu2').text
|
||||||
|
|
||||||
|
material = self.scene.create_physical_material(
|
||||||
|
static_friction=np.clip(float(static_fric), 0.1, 0.7),
|
||||||
|
dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
|
||||||
|
restitution=0.05,
|
||||||
|
)
|
||||||
|
builder = self.scene.create_actor_builder()
|
||||||
|
body_type = "static" if use_static else "dynamic"
|
||||||
|
builder.set_physx_body_type(body_type)
|
||||||
|
builder.add_multiple_convex_collisions_from_file(
|
||||||
|
collision_file, material=material
|
||||||
|
)
|
||||||
|
builder.add_visual_from_file(visual_file)
|
||||||
|
actor = builder.build(name=node_name)
|
||||||
|
actor.set_name(node_name)
|
||||||
|
actor.set_pose(pose)
|
||||||
|
|
||||||
|
if update_mass and hasattr(actor.components[1], "mass"):
|
||||||
|
node_mass = float(root.find('.//inertial/mass').get("value"))
|
||||||
|
actor.components[1].set_mass(node_mass)
|
||||||
|
|
||||||
|
return actor
|
||||||
|
|
||||||
|
def create_camera(
|
||||||
|
self,
|
||||||
|
cam_name: str,
|
||||||
|
pose: sapien.Pose,
|
||||||
|
image_hw: tuple[int, int],
|
||||||
|
fovy_deg: float,
|
||||||
|
) -> sapien.render.RenderCameraComponent:
|
||||||
|
"""Create a single camera in the scene.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
cam_name (str): Name of the camera.
|
||||||
|
pose (sapien.Pose): Camera pose p=(x, y, z), q=(w, x, y, z)
|
||||||
|
image_hw (Tuple[int, int]): Image resolution (height, width) for cameras.
|
||||||
|
fovy_deg (float): Field of view in degrees for cameras.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
sapien.render.RenderCameraComponent: The created camera.
|
||||||
|
"""
|
||||||
|
cam_actor = self.scene.create_actor_builder().build_kinematic()
|
||||||
|
cam_actor.set_pose(pose)
|
||||||
|
camera = self.scene.add_mounted_camera(
|
||||||
|
name=cam_name,
|
||||||
|
mount=cam_actor,
|
||||||
|
pose=sapien.Pose(p=[0, 0, 0], q=[1, 0, 0, 0]),
|
||||||
|
width=image_hw[1],
|
||||||
|
height=image_hw[0],
|
||||||
|
fovy=np.deg2rad(fovy_deg),
|
||||||
|
near=0.01,
|
||||||
|
far=100,
|
||||||
|
)
|
||||||
|
self.cameras.append(camera)
|
||||||
|
|
||||||
|
return camera
|
||||||
|
|
||||||
|
def initialize_circular_cameras(
|
||||||
|
self,
|
||||||
|
num_cameras: int,
|
||||||
|
radius: float,
|
||||||
|
height: float,
|
||||||
|
target_pt: list[float],
|
||||||
|
image_hw: tuple[int, int],
|
||||||
|
fovy_deg: float,
|
||||||
|
) -> list[sapien.render.RenderCameraComponent]:
|
||||||
|
"""Initialize multiple cameras arranged in a circle.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
num_cameras (int): Number of cameras to create.
|
||||||
|
radius (float): Radius of the camera circle.
|
||||||
|
height (float): Fixed Z-coordinate of the cameras.
|
||||||
|
target_pt (list[float]): 3D point (x, y, z) that cameras look at.
|
||||||
|
image_hw (Tuple[int, int]): Image resolution (height, width) for cameras.
|
||||||
|
fovy_deg (float): Field of view in degrees for cameras.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List[sapien.render.RenderCameraComponent]: List of created cameras.
|
||||||
|
"""
|
||||||
|
angle_step = 2 * np.pi / num_cameras
|
||||||
|
world_up_vec = np.array([0.0, 0.0, 1.0])
|
||||||
|
target_pt = np.array(target_pt)
|
||||||
|
|
||||||
|
for i in range(num_cameras):
|
||||||
|
angle = i * angle_step
|
||||||
|
cam_x = radius * np.cos(angle)
|
||||||
|
cam_y = radius * np.sin(angle)
|
||||||
|
cam_z = height
|
||||||
|
eye_pos = [cam_x, cam_y, cam_z]
|
||||||
|
|
||||||
|
forward_vec = target_pt - eye_pos
|
||||||
|
forward_vec = forward_vec / np.linalg.norm(forward_vec)
|
||||||
|
temp_right_vec = np.cross(forward_vec, world_up_vec)
|
||||||
|
|
||||||
|
if np.linalg.norm(temp_right_vec) < 1e-6:
|
||||||
|
temp_right_vec = np.array([1.0, 0.0, 0.0])
|
||||||
|
if np.abs(np.dot(temp_right_vec, forward_vec)) > 0.99:
|
||||||
|
temp_right_vec = np.array([0.0, 1.0, 0.0])
|
||||||
|
|
||||||
|
right_vec = temp_right_vec / np.linalg.norm(temp_right_vec)
|
||||||
|
up_vec = np.cross(right_vec, forward_vec)
|
||||||
|
rotation_matrix = np.array([forward_vec, -right_vec, up_vec]).T
|
||||||
|
|
||||||
|
rot = R.from_matrix(rotation_matrix)
|
||||||
|
scipy_quat = rot.as_quat() # (x, y, z, w)
|
||||||
|
quat = [
|
||||||
|
scipy_quat[3],
|
||||||
|
scipy_quat[0],
|
||||||
|
scipy_quat[1],
|
||||||
|
scipy_quat[2],
|
||||||
|
] # (w, x, y, z)
|
||||||
|
|
||||||
|
self.create_camera(
|
||||||
|
f"camera_{i}",
|
||||||
|
sapien.Pose(p=eye_pos, q=quat),
|
||||||
|
image_hw,
|
||||||
|
fovy_deg,
|
||||||
|
)
|
||||||
|
|
||||||
|
return self.cameras
|
||||||
|
|
||||||
|
def render_images(
|
||||||
|
self,
|
||||||
|
camera: sapien.render.RenderCameraComponent,
|
||||||
|
render_keys: list[
|
||||||
|
Literal[
|
||||||
|
"Color",
|
||||||
|
"Segmentation",
|
||||||
|
"Normal",
|
||||||
|
"Mask",
|
||||||
|
"Depth",
|
||||||
|
"Foreground",
|
||||||
|
]
|
||||||
|
] = None,
|
||||||
|
) -> dict[str, Image.Image]:
|
||||||
|
"""Render images from a given camera.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
camera (sapien.render.RenderCameraComponent): The camera to render from.
|
||||||
|
render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict[str, Image.Image]: Dictionary of rendered images.
|
||||||
|
"""
|
||||||
|
if render_keys is None:
|
||||||
|
render_keys = [
|
||||||
|
"Color",
|
||||||
|
"Segmentation",
|
||||||
|
"Normal",
|
||||||
|
"Mask",
|
||||||
|
"Depth",
|
||||||
|
"Foreground",
|
||||||
|
]
|
||||||
|
|
||||||
|
results: dict[str, Image.Image] = {}
|
||||||
|
if "Color" in render_keys:
|
||||||
|
color = camera.get_picture("Color")
|
||||||
|
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||||
|
results["Color"] = Image.fromarray(color_rgb)
|
||||||
|
|
||||||
|
if "Mask" in render_keys:
|
||||||
|
alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
|
||||||
|
results["Mask"] = Image.fromarray(alpha)
|
||||||
|
|
||||||
|
if "Segmentation" in render_keys:
|
||||||
|
seg_labels = camera.get_picture("Segmentation")
|
||||||
|
label0 = seg_labels[..., 0].astype(np.uint8)
|
||||||
|
seg_color = self.COLOR_PALETTE[label0]
|
||||||
|
results["Segmentation"] = Image.fromarray(seg_color)
|
||||||
|
|
||||||
|
if "Foreground" in render_keys:
|
||||||
|
seg_labels = camera.get_picture("Segmentation")
|
||||||
|
label0 = seg_labels[..., 0]
|
||||||
|
mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
|
||||||
|
color = camera.get_picture("Color")
|
||||||
|
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||||
|
foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
|
||||||
|
results["Foreground"] = Image.fromarray(foreground)
|
||||||
|
|
||||||
|
if "Normal" in render_keys:
|
||||||
|
normal = camera.get_picture("Normal")[..., :3]
|
||||||
|
normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
|
||||||
|
results["Normal"] = Image.fromarray(normal_img)
|
||||||
|
|
||||||
|
if "Depth" in render_keys:
|
||||||
|
position_map = camera.get_picture("Position")
|
||||||
|
depth = -position_map[..., 2]
|
||||||
|
alpha = torch.tensor(color[..., 3], dtype=torch.float32)
|
||||||
|
norm_depth = DiffrastRender.normalize_map_by_mask(
|
||||||
|
torch.tensor(depth), alpha
|
||||||
|
)
|
||||||
|
depth_img = (norm_depth * 255).to(torch.uint8).numpy()
|
||||||
|
results["Depth"] = Image.fromarray(depth_img)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def load_assets_from_layout_file(
|
||||||
|
self,
|
||||||
|
layout: LayoutInfo,
|
||||||
|
z_offset: float = 0.0,
|
||||||
|
init_quat: list[float] = [0, 0, 0, 1],
|
||||||
|
robot_name: str = None,
|
||||||
|
control_freq: int = 40,
|
||||||
|
robot_init_qpos_noise: float = 0,
|
||||||
|
) -> None:
|
||||||
|
"""Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
layout (LayoutInfo): The layout information data.
|
||||||
|
z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
|
||||||
|
init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
|
||||||
|
robot_name (str): Name of the robot to load, None not load robot.
|
||||||
|
control_freq (int): Control frequency for the robot, default is 40.
|
||||||
|
robot_init_qpos_noise (float): Noise to add to the robot's initial joint positions, default is 0.
|
||||||
|
"""
|
||||||
|
for node in layout.assets:
|
||||||
|
file_dir = layout.assets[node]
|
||||||
|
file_name = f"{node.replace(' ', '_')}.urdf"
|
||||||
|
urdf_file = os.path.join(file_dir, file_name)
|
||||||
|
|
||||||
|
if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||||
|
continue
|
||||||
|
|
||||||
|
position = layout.position[node].copy()
|
||||||
|
if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
|
||||||
|
position[2] += z_offset
|
||||||
|
|
||||||
|
use_static = (
|
||||||
|
layout.relation.get(Scene3DItemEnum.CONTEXT.value, None)
|
||||||
|
== node
|
||||||
|
)
|
||||||
|
|
||||||
|
# Combine initial quaternion with object quaternion
|
||||||
|
x, y, z, qx, qy, qz, qw = position
|
||||||
|
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
|
||||||
|
actor = self.load_actor_from_urdf(
|
||||||
|
urdf_file,
|
||||||
|
sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
|
||||||
|
use_static=use_static,
|
||||||
|
update_mass=False,
|
||||||
|
)
|
||||||
|
self.actors[node] = actor
|
||||||
|
|
||||||
|
if robot_name is not None:
|
||||||
|
robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
|
||||||
|
x, y, z, qx, qy, qz, qw = layout.position[robot_node]
|
||||||
|
delta_z = 0.002 # Add small offset to avoid collision.
|
||||||
|
pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
|
||||||
|
self.robot = self.load_exists_robot(
|
||||||
|
robot_name, pose, control_freq, robot_init_qpos_noise
|
||||||
|
)
|
||||||
|
|
||||||
|
return
|
||||||
|
|
||||||
|
|
||||||
|
class FrankaPandaGrasper(object):
|
||||||
|
"""Grasper planner for Franka Panda robot arm.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
robot: See from `mani_skill.agents.robots.panda.panda`.
|
||||||
|
scene: The `SapienSceneManager` object.
|
||||||
|
sim_steps_per_control: Number of simulation steps per control step.
|
||||||
|
control_timestep: Time step for planning.
|
||||||
|
joint_vel_limits: Velocity limit for all joints (default: 0.9)
|
||||||
|
joint_acc_limits: Acceleration limit for all joints (default: 0.9)
|
||||||
|
finger_length: Length of the robot's fingers (default: 0.025)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
robot,
|
||||||
|
scene: SapienSceneManager,
|
||||||
|
sim_steps_per_control: int,
|
||||||
|
control_timestep: float,
|
||||||
|
joint_vel_limits: float = 0.9,
|
||||||
|
joint_acc_limits: float = 0.9,
|
||||||
|
finger_length: float = 0.025,
|
||||||
|
) -> None:
|
||||||
|
self.agent = robot
|
||||||
|
self.robot = robot.robot
|
||||||
|
self.scene = scene
|
||||||
|
self.sim_steps_per_control = sim_steps_per_control
|
||||||
|
self.control_timestep = control_timestep
|
||||||
|
self.joint_vel_limits = joint_vel_limits
|
||||||
|
self.joint_acc_limits = joint_acc_limits
|
||||||
|
self.finger_length = finger_length
|
||||||
|
self.planner = self._setup_planner()
|
||||||
|
|
||||||
|
def _setup_planner(self) -> mplib.Planner:
|
||||||
|
link_names = [link.get_name() for link in self.robot.get_links()]
|
||||||
|
joint_names = [
|
||||||
|
joint.get_name() for joint in self.robot.get_active_joints()
|
||||||
|
]
|
||||||
|
|
||||||
|
planner = mplib.Planner(
|
||||||
|
urdf=self.agent.urdf_path,
|
||||||
|
srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
|
||||||
|
user_link_names=link_names,
|
||||||
|
user_joint_names=joint_names,
|
||||||
|
move_group="panda_hand_tcp",
|
||||||
|
joint_vel_limits=np.ones(7) * self.joint_vel_limits,
|
||||||
|
joint_acc_limits=np.ones(7) * self.joint_acc_limits,
|
||||||
|
)
|
||||||
|
planner.set_base_pose(self.robot.pose.raw_pose[0].tolist())
|
||||||
|
|
||||||
|
return planner
|
||||||
|
|
||||||
|
def update_state(
|
||||||
|
self,
|
||||||
|
result: dict,
|
||||||
|
gripper_state: Literal[-1, 1],
|
||||||
|
sim_steps_per_control: int,
|
||||||
|
cameras: list[sapien.render.RenderCameraComponent],
|
||||||
|
render_keys: list[str],
|
||||||
|
) -> dict[str, list[Image.Image]]:
|
||||||
|
n_step = len(result["position"])
|
||||||
|
total_frames = defaultdict(list)
|
||||||
|
for i in tqdm(range(n_step), desc="Grasping"):
|
||||||
|
qpos = result["position"][min(i, n_step - 1)]
|
||||||
|
action = np.hstack([qpos, gripper_state])[None, ...]
|
||||||
|
action = torch.from_numpy(action).float()
|
||||||
|
frames = self.scene.step_action(
|
||||||
|
action, sim_steps_per_control, cameras, render_keys
|
||||||
|
)
|
||||||
|
for camera in cameras:
|
||||||
|
total_frames[camera.name].extend(frames[camera.name])
|
||||||
|
|
||||||
|
return total_frames
|
||||||
|
|
||||||
|
def control_gripper(
|
||||||
|
self,
|
||||||
|
sim_steps_per_control: int,
|
||||||
|
cameras: list[sapien.render.RenderCameraComponent],
|
||||||
|
render_keys: list[str],
|
||||||
|
gripper_state: Literal[-1, 1],
|
||||||
|
n_step: int = 10,
|
||||||
|
) -> dict[str, list[Image.Image]]:
|
||||||
|
total_frames = defaultdict(list)
|
||||||
|
qpos = self.robot.get_qpos()[0, :-2].cpu().numpy()
|
||||||
|
for _ in range(n_step):
|
||||||
|
action = np.hstack([qpos, gripper_state])[None, ...]
|
||||||
|
action = torch.from_numpy(action).float()
|
||||||
|
frames = self.scene.step_action(
|
||||||
|
action, sim_steps_per_control, cameras, render_keys
|
||||||
|
)
|
||||||
|
for camera in cameras:
|
||||||
|
total_frames[camera.name].extend(frames[camera.name])
|
||||||
|
|
||||||
|
return total_frames
|
||||||
|
|
||||||
|
def move_to_pose(
|
||||||
|
self,
|
||||||
|
pose: sapien.Pose,
|
||||||
|
sim_steps_per_control: int,
|
||||||
|
control_timestep: float,
|
||||||
|
cameras: list[sapien.render.RenderCameraComponent],
|
||||||
|
render_keys: list[str],
|
||||||
|
gripper_state: Literal[-1, 1],
|
||||||
|
use_point_cloud: bool = False,
|
||||||
|
n_max_step: int = 100,
|
||||||
|
) -> dict[str, list[Image.Image]]:
|
||||||
|
# First try screw motion planning
|
||||||
|
result = self.planner.plan_screw(
|
||||||
|
np.concatenate([pose.p, pose.q]),
|
||||||
|
self.robot.get_qpos().cpu().numpy()[0],
|
||||||
|
time_step=control_timestep,
|
||||||
|
use_point_cloud=use_point_cloud,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Fallback to standard planning if screw fails
|
||||||
|
if result["status"] != "Success":
|
||||||
|
result = self.planner.plan_qpos_to_pose(
|
||||||
|
np.concatenate([pose.p, pose.q]),
|
||||||
|
self.robot.get_qpos().cpu().numpy()[0],
|
||||||
|
time_step=control_timestep,
|
||||||
|
use_point_cloud=use_point_cloud,
|
||||||
|
)
|
||||||
|
|
||||||
|
if result["status"] != "Success":
|
||||||
|
return defaultdict(list)
|
||||||
|
|
||||||
|
for key in result:
|
||||||
|
if key in ["status", "duration"]:
|
||||||
|
continue
|
||||||
|
sample_ratio = (len(result[key]) // n_max_step) + 1
|
||||||
|
result[key] = result[key][::sample_ratio]
|
||||||
|
|
||||||
|
frames = self.update_state(
|
||||||
|
result, gripper_state, sim_steps_per_control, cameras, render_keys
|
||||||
|
)
|
||||||
|
|
||||||
|
return frames
|
||||||
|
|
||||||
|
def render_grasp(
|
||||||
|
self,
|
||||||
|
actor: sapien.pysapien.Entity,
|
||||||
|
cameras: list[sapien.render.RenderCameraComponent],
|
||||||
|
render_keys: list[
|
||||||
|
Literal[
|
||||||
|
"Color",
|
||||||
|
"Foreground",
|
||||||
|
"Segmentation",
|
||||||
|
"Normal",
|
||||||
|
"Mask",
|
||||||
|
"Depth",
|
||||||
|
]
|
||||||
|
],
|
||||||
|
reach_target_only: bool = True,
|
||||||
|
) -> dict[str, list[Image.Image]]:
|
||||||
|
physx_rigid = actor.components[1]
|
||||||
|
mesh = get_component_mesh(physx_rigid, to_world_frame=True)
|
||||||
|
obb = mesh.bounding_box_oriented
|
||||||
|
approaching = np.array([0, 0, -1])
|
||||||
|
target_closing = (
|
||||||
|
self.agent.tcp.pose.to_transformation_matrix()[0, :3, 1]
|
||||||
|
.cpu()
|
||||||
|
.numpy()
|
||||||
|
)
|
||||||
|
grasp_info = compute_grasp_info_by_obb(
|
||||||
|
obb,
|
||||||
|
approaching=approaching,
|
||||||
|
target_closing=target_closing,
|
||||||
|
depth=self.finger_length,
|
||||||
|
)
|
||||||
|
closing, center = grasp_info["closing"], grasp_info["center"]
|
||||||
|
raw_tcp_pose = self.agent.tcp.pose.sp
|
||||||
|
grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
|
||||||
|
reach_pose = grasp_pose * sapien.Pose(p=[0, 0, -0.05])
|
||||||
|
grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
|
||||||
|
|
||||||
|
total_frames = defaultdict(list)
|
||||||
|
frames_list = []
|
||||||
|
reach_frames = self.move_to_pose(
|
||||||
|
reach_pose,
|
||||||
|
self.sim_steps_per_control,
|
||||||
|
self.control_timestep,
|
||||||
|
cameras,
|
||||||
|
render_keys,
|
||||||
|
gripper_state=1,
|
||||||
|
)
|
||||||
|
frames_list.append(reach_frames)
|
||||||
|
if len(reach_frames) == 0:
|
||||||
|
logger.warning(
|
||||||
|
f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping."
|
||||||
|
)
|
||||||
|
return total_frames
|
||||||
|
|
||||||
|
if not reach_target_only:
|
||||||
|
grasp_frames = self.move_to_pose(
|
||||||
|
grasp_pose,
|
||||||
|
self.sim_steps_per_control,
|
||||||
|
self.control_timestep,
|
||||||
|
cameras,
|
||||||
|
render_keys,
|
||||||
|
gripper_state=1,
|
||||||
|
)
|
||||||
|
frames_list.append(grasp_frames)
|
||||||
|
close_frames = self.control_gripper(
|
||||||
|
self.sim_steps_per_control,
|
||||||
|
cameras,
|
||||||
|
render_keys,
|
||||||
|
gripper_state=-1,
|
||||||
|
)
|
||||||
|
frames_list.append(close_frames)
|
||||||
|
back_frames = self.move_to_pose(
|
||||||
|
raw_tcp_pose,
|
||||||
|
self.sim_steps_per_control,
|
||||||
|
self.control_timestep,
|
||||||
|
cameras,
|
||||||
|
render_keys,
|
||||||
|
gripper_state=-1,
|
||||||
|
)
|
||||||
|
frames_list.append(back_frames)
|
||||||
|
|
||||||
|
for frame_dict in frames_list:
|
||||||
|
for cam_name, frames in frame_dict.items():
|
||||||
|
total_frames[cam_name].extend(frames)
|
||||||
|
|
||||||
|
self.agent.reset(self.agent.init_qpos)
|
||||||
|
|
||||||
|
return total_frames
|
||||||
@ -1 +1 @@
|
|||||||
VERSION = "v0.1.2"
|
VERSION = "v0.1.3"
|
||||||
|
|||||||
@ -109,7 +109,7 @@ class MeshGeoChecker(BaseChecker):
|
|||||||
if self.prompt is None:
|
if self.prompt is None:
|
||||||
self.prompt = """
|
self.prompt = """
|
||||||
You are an expert in evaluating the geometry quality of generated 3D asset.
|
You are an expert in evaluating the geometry quality of generated 3D asset.
|
||||||
You will be given rendered views of a generated 3D asset with black background.
|
You will be given rendered views of a generated 3D asset, type {}, with black background.
|
||||||
Your task is to evaluate the quality of the 3D asset generation,
|
Your task is to evaluate the quality of the 3D asset generation,
|
||||||
including geometry, structure, and appearance, based on the rendered views.
|
including geometry, structure, and appearance, based on the rendered views.
|
||||||
Criteria:
|
Criteria:
|
||||||
@ -130,10 +130,13 @@ class MeshGeoChecker(BaseChecker):
|
|||||||
Image shows a chair with simplified back legs and soft edges → YES
|
Image shows a chair with simplified back legs and soft edges → YES
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def query(self, image_paths: list[str | Image.Image]) -> str:
|
def query(
|
||||||
|
self, image_paths: list[str | Image.Image], text: str = "unknown"
|
||||||
|
) -> str:
|
||||||
|
input_prompt = self.prompt.format(text)
|
||||||
|
|
||||||
return self.gpt_client.query(
|
return self.gpt_client.query(
|
||||||
text_prompt=self.prompt,
|
text_prompt=input_prompt,
|
||||||
image_base64=image_paths,
|
image_base64=image_paths,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@ -107,36 +107,37 @@ class URDFGenerator(object):
|
|||||||
already provided, use it directly), accurately describe this 3D object asset (within 15 words),
|
already provided, use it directly), accurately describe this 3D object asset (within 15 words),
|
||||||
Determine the pose of the object in the first image and estimate the true vertical height
|
Determine the pose of the object in the first image and estimate the true vertical height
|
||||||
(vertical projection) range of the object (in meters), i.e., how tall the object appears from top
|
(vertical projection) range of the object (in meters), i.e., how tall the object appears from top
|
||||||
to bottom in the front view (first) image. also weight range (unit: kilogram), the average
|
to bottom in the first image. also weight range (unit: kilogram), the average
|
||||||
static friction coefficient of the object relative to rubber and the average dynamic friction
|
static friction coefficient of the object relative to rubber and the average dynamic friction
|
||||||
coefficient of the object relative to rubber. Return response format as shown in Output Example.
|
coefficient of the object relative to rubber. Return response in format as shown in Output Example.
|
||||||
|
|
||||||
Output Example:
|
Output Example:
|
||||||
Category: cup
|
Category: cup
|
||||||
Description: shiny golden cup with floral design
|
Description: shiny golden cup with floral design
|
||||||
Height: 0.1-0.15 m
|
Pose: <short_description_within_10_words>
|
||||||
|
Height: 0.10-0.15 m
|
||||||
Weight: 0.3-0.6 kg
|
Weight: 0.3-0.6 kg
|
||||||
Static friction coefficient: 0.6
|
Static friction coefficient: 0.6
|
||||||
Dynamic friction coefficient: 0.5
|
Dynamic friction coefficient: 0.5
|
||||||
|
|
||||||
IMPORTANT: Estimating Vertical Height from the First (Front View) Image.
|
IMPORTANT: Estimating Vertical Height from the First (Front View) Image and pose estimation based on all views.
|
||||||
- The "vertical height" refers to the real-world vertical size of the object
|
- The "vertical height" refers to the real-world vertical size of the object
|
||||||
as projected in the first image, aligned with the image's vertical axis.
|
as projected in the first image, aligned with the image's vertical axis.
|
||||||
- For flat objects like plates or disks or book, if their face is visible in the front view,
|
- For flat objects like plates or disks or book, if their face is visible in the front view,
|
||||||
use the diameter as the vertical height. If the edge is visible, use the thickness instead.
|
use the diameter as the vertical height. If the edge is visible, use the thickness instead.
|
||||||
- This is not necessarily the full length of the object, but how tall it appears
|
- This is not necessarily the full length of the object, but how tall it appears
|
||||||
in the first image vertically, based on its pose and orientation.
|
in the first image vertically, based on its pose and orientation estimation on all views.
|
||||||
- For objects(e.g., spoons, forks, writing instruments etc.) at an angle showing in
|
- For objects(e.g., spoons, forks, writing instruments etc.) at an angle showing in images,
|
||||||
the first image, tilted at 45° will appear shorter vertically than when upright.
|
e.g., tilted at 45° will appear shorter vertically than when upright.
|
||||||
Estimate the vertical projection of their real length based on its pose.
|
Estimate the vertical projection of their real length based on its pose.
|
||||||
For example:
|
For example:
|
||||||
- A pen standing upright in the first view (aligned with the image's vertical axis)
|
- A pen standing upright in the first image (aligned with the image's vertical axis)
|
||||||
full body visible in the first image: → vertical height ≈ 0.14-0.20 m
|
full body visible in the first image: → vertical height ≈ 0.14-0.20 m
|
||||||
- A pen lying flat in the front view (showing thickness) → vertical height ≈ 0.018-0.025 m
|
- A pen lying flat in the first image (showing thickness or as a dot) → vertical height ≈ 0.018-0.025 m
|
||||||
- Tilted pen in the first image (e.g., ~45° angle): vertical height ≈ 0.07-0.12 m
|
- Tilted pen in the first image (e.g., ~45° angle): vertical height ≈ 0.07-0.12 m
|
||||||
- Use the rest views(except the first image) to help determine the object's 3D pose and orientation.
|
- Use the rest views to help determine the object's 3D pose and orientation.
|
||||||
Assume the object is in real-world scale and estimate the approximate vertical height
|
Assume the object is in real-world scale and estimate the approximate vertical height
|
||||||
(in meters) based on how large it appears vertically in the first image.
|
based on the pose estimation and how large it appears vertically in the first image.
|
||||||
"""
|
"""
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -163,14 +164,14 @@ class URDFGenerator(object):
|
|||||||
description = lines[1].split(": ")[1]
|
description = lines[1].split(": ")[1]
|
||||||
min_height, max_height = map(
|
min_height, max_height = map(
|
||||||
lambda x: float(x.strip().replace(",", "").split()[0]),
|
lambda x: float(x.strip().replace(",", "").split()[0]),
|
||||||
lines[2].split(": ")[1].split("-"),
|
lines[3].split(": ")[1].split("-"),
|
||||||
)
|
)
|
||||||
min_mass, max_mass = map(
|
min_mass, max_mass = map(
|
||||||
lambda x: float(x.strip().replace(",", "").split()[0]),
|
lambda x: float(x.strip().replace(",", "").split()[0]),
|
||||||
lines[3].split(": ")[1].split("-"),
|
lines[4].split(": ")[1].split("-"),
|
||||||
)
|
)
|
||||||
mu1 = float(lines[4].split(": ")[1].replace(",", ""))
|
mu1 = float(lines[5].split(": ")[1].replace(",", ""))
|
||||||
mu2 = float(lines[5].split(": ")[1].replace(",", ""))
|
mu2 = float(lines[6].split(": ")[1].replace(",", ""))
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"category": category.lower(),
|
"category": category.lower(),
|
||||||
|
|||||||
@ -7,7 +7,7 @@ packages = ["embodied_gen"]
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "embodied_gen"
|
name = "embodied_gen"
|
||||||
version = "v0.1.2"
|
version = "v0.1.3"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
license = "Apache-2.0"
|
license = "Apache-2.0"
|
||||||
license-files = ["LICENSE", "NOTICE"]
|
license-files = ["LICENSE", "NOTICE"]
|
||||||
@ -32,6 +32,8 @@ backproject-cli = "embodied_gen.data.backproject_v2:entrypoint"
|
|||||||
img3d-cli = "embodied_gen.scripts.imageto3d:entrypoint"
|
img3d-cli = "embodied_gen.scripts.imageto3d:entrypoint"
|
||||||
text3d-cli = "embodied_gen.scripts.textto3d:text_to_3d"
|
text3d-cli = "embodied_gen.scripts.textto3d:text_to_3d"
|
||||||
scene3d-cli = "embodied_gen.scripts.gen_scene3d:entrypoint"
|
scene3d-cli = "embodied_gen.scripts.gen_scene3d:entrypoint"
|
||||||
|
layout-cli = "embodied_gen.scripts.gen_layout:entrypoint"
|
||||||
|
sim-cli = "embodied_gen.scripts.simulate_sapien:entrypoint"
|
||||||
|
|
||||||
[tool.pydocstyle]
|
[tool.pydocstyle]
|
||||||
match = '(?!test_).*(?!_pb2)\.py'
|
match = '(?!test_).*(?!_pb2)\.py'
|
||||||
|
|||||||
@ -38,4 +38,5 @@ tyro
|
|||||||
pyquaternion
|
pyquaternion
|
||||||
shapely
|
shapely
|
||||||
sapien==3.0.0b1
|
sapien==3.0.0b1
|
||||||
|
mani_skill==3.0.0b21
|
||||||
typing_extensions==4.14.1
|
typing_extensions==4.14.1
|
||||||
|
|||||||
@ -77,10 +77,18 @@ def panoocc_checker():
|
|||||||
|
|
||||||
|
|
||||||
def test_geo_checker(geo_checker):
|
def test_geo_checker(geo_checker):
|
||||||
|
from embodied_gen.utils.process_media import combine_images_to_grid
|
||||||
|
|
||||||
|
image_paths = glob(
|
||||||
|
"outputs/layouts_gens3/task_0000/asset3d/pen/result/renders/image_color/*png"
|
||||||
|
)
|
||||||
|
image = combine_images_to_grid(image_paths)[0]
|
||||||
|
# image = "outputs/utest_cases/geo_checker/color.png"
|
||||||
flag, result = geo_checker(
|
flag, result = geo_checker(
|
||||||
[
|
[
|
||||||
"apps/assets/example_image/sample_02.jpg",
|
image,
|
||||||
]
|
],
|
||||||
|
text="pen",
|
||||||
)
|
)
|
||||||
logger.info(f"geo_checker: {flag}, {result}")
|
logger.info(f"geo_checker: {flag}, {result}")
|
||||||
assert isinstance(flag, bool)
|
assert isinstance(flag, bool)
|
||||||
@ -107,13 +115,42 @@ def test_seg_checker(seg_checker):
|
|||||||
|
|
||||||
|
|
||||||
def test_semantic_checker(semantic_checker):
|
def test_semantic_checker(semantic_checker):
|
||||||
flag, result = semantic_checker(
|
test_cases = [
|
||||||
text="can",
|
("table", "outputs/utest_cases/semantic_checker/task_0001_table.png"),
|
||||||
image=["apps/assets/example_image/sample_02.jpg"],
|
("table", "outputs/utest_cases/semantic_checker/task_0002_table.png"),
|
||||||
)
|
(
|
||||||
logger.info(f"semantic_checker: {flag}, {result}")
|
"banana",
|
||||||
assert isinstance(flag, bool)
|
"outputs/utest_cases/semantic_checker/task_0002_banana.png",
|
||||||
assert isinstance(result, str)
|
),
|
||||||
|
(
|
||||||
|
"napkin",
|
||||||
|
"outputs/utest_cases/semantic_checker/task_0003_napkin.png",
|
||||||
|
),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0003_table.png"),
|
||||||
|
("fork", "outputs/utest_cases/semantic_checker/task_0005_fork.png"),
|
||||||
|
("spoon", "outputs/utest_cases/semantic_checker/task_0005_spoon.png"),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0007_table.png"),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0008_table.png"),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0009_table.png"),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0011_table.png"),
|
||||||
|
("desk", "outputs/utest_cases/semantic_checker/task_0012_desk.png"),
|
||||||
|
(
|
||||||
|
"laptop",
|
||||||
|
"outputs/utest_cases/semantic_checker/task_0012_laptop.png",
|
||||||
|
),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0014_table.png"),
|
||||||
|
("desk", "outputs/utest_cases/semantic_checker/task_0016_desk.png"),
|
||||||
|
("shelf", "outputs/utest_cases/semantic_checker/task_0018_shelf.png"),
|
||||||
|
("table", "outputs/utest_cases/semantic_checker/task_0000_table.png"),
|
||||||
|
]
|
||||||
|
for test_case in test_cases:
|
||||||
|
flag, result = semantic_checker(
|
||||||
|
text=test_case[0],
|
||||||
|
image=[test_case[1]],
|
||||||
|
)
|
||||||
|
logger.info(f"semantic_checker: {flag}, {result}, {test_case[1]}")
|
||||||
|
assert isinstance(flag, bool)
|
||||||
|
assert isinstance(result, str)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.parametrize(
|
@pytest.mark.parametrize(
|
||||||
@ -139,25 +176,21 @@ def test_textgen_checker(textalign_checker, mesh_path, text_desc):
|
|||||||
|
|
||||||
|
|
||||||
def test_panoheight_estimator(pano_height_estimator):
|
def test_panoheight_estimator(pano_height_estimator):
|
||||||
image_paths = glob("outputs/bg_v3/test2/*/*.png")
|
image_paths = glob("outputs/bg_scenes/*/*.png")
|
||||||
for image_path in image_paths:
|
for image_path in image_paths:
|
||||||
result = pano_height_estimator(image_path)
|
result = pano_height_estimator(image_path)
|
||||||
logger.info(f"{type(result)}, {result}")
|
logger.info(f"{type(result)}, {result}")
|
||||||
|
|
||||||
|
|
||||||
def test_pano_checker(pano_checker):
|
def test_pano_checker(pano_checker):
|
||||||
# image_paths = [
|
image_paths = glob("outputs/bg_scenes/*/*.png")
|
||||||
# "outputs/bg_gen2/scene_0000/pano_image.png",
|
|
||||||
# "outputs/bg_gen2/scene_0001/pano_image.png",
|
|
||||||
# ]
|
|
||||||
image_paths = glob("outputs/bg_gen/*/*.png")
|
|
||||||
for image_path in image_paths:
|
for image_path in image_paths:
|
||||||
flag, result = pano_checker(image_path)
|
flag, result = pano_checker(image_path)
|
||||||
logger.info(f"{image_path} {flag}, {result}")
|
logger.info(f"{image_path} {flag}, {result}")
|
||||||
|
|
||||||
|
|
||||||
def test_panoocc_checker(panoocc_checker):
|
def test_panoocc_checker(panoocc_checker):
|
||||||
image_paths = glob("outputs/bg_gen/*/*.png")
|
image_paths = glob("outputs/bg_scenes/*/*.png")
|
||||||
for image_path in image_paths:
|
for image_path in image_paths:
|
||||||
flag, result = panoocc_checker(image_path)
|
flag, result = panoocc_checker(image_path)
|
||||||
logger.info(f"{image_path} {flag}, {result}")
|
logger.info(f"{image_path} {flag}, {result}")
|
||||||
|
|||||||
51
tests/test_examples/test_urdf_convertor.py
Normal file
51
tests/test_examples/test_urdf_convertor.py
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
# Project EmbodiedGen
|
||||||
|
#
|
||||||
|
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
# implied. See the License for the specific language governing
|
||||||
|
# permissions and limitations under the License.
|
||||||
|
|
||||||
|
from embodied_gen.utils.gpt_clients import GPT_CLIENT
|
||||||
|
from embodied_gen.validators.urdf_convertor import URDFGenerator
|
||||||
|
|
||||||
|
|
||||||
|
def test_urdf_convertor():
|
||||||
|
urdf_gen = URDFGenerator(GPT_CLIENT, render_view_num=4)
|
||||||
|
mesh_paths = [
|
||||||
|
"outputs/test_urdf/sample_0/mesh/pen.obj",
|
||||||
|
"outputs/test_urdf/sample_1/mesh/notepad.obj",
|
||||||
|
"outputs/test_urdf/sample_2/mesh/plate.obj",
|
||||||
|
"outputs/test_urdf/sample_3/mesh/spoon.obj",
|
||||||
|
"outputs/test_urdf/sample_4/mesh/notebook.obj",
|
||||||
|
"outputs/test_urdf/sample_5/mesh/plate.obj",
|
||||||
|
"outputs/test_urdf/sample_6/mesh/spoon.obj",
|
||||||
|
"outputs/test_urdf/sample_7/mesh/book.obj",
|
||||||
|
"outputs/test_urdf/sample_8/mesh/lamp.obj",
|
||||||
|
"outputs/test_urdf/sample_9/mesh/remote_control.obj",
|
||||||
|
"outputs/test_urdf/sample_10/mesh/keyboard.obj",
|
||||||
|
"outputs/test_urdf/sample_11/mesh/mouse.obj",
|
||||||
|
"outputs/test_urdf/sample_12/mesh/table.obj",
|
||||||
|
"outputs/test_urdf/sample_13/mesh/marker.obj",
|
||||||
|
"outputs/test_urdf/pen/result/mesh/pen.obj",
|
||||||
|
"outputs/test_urdf/notebook/result/mesh/notebook.obj",
|
||||||
|
"outputs/test_urdf/marker/result/mesh/marker.obj",
|
||||||
|
"outputs/test_urdf/pen2/result/mesh/pen.obj",
|
||||||
|
]
|
||||||
|
for idx, mesh_path in enumerate(mesh_paths):
|
||||||
|
filename = mesh_path.split("/")[-1].split(".")[0]
|
||||||
|
urdf_path = urdf_gen(
|
||||||
|
mesh_path=mesh_path,
|
||||||
|
output_root=f"outputs/test_urdf2/sample_{idx}",
|
||||||
|
category=filename,
|
||||||
|
# min_height=1.0,
|
||||||
|
# max_height=1.2,
|
||||||
|
)
|
||||||
Loading…
x
Reference in New Issue
Block a user