feat(sim): Optimize support for downstream simulators and gym. (#36)

* feat(sim): Optimize support for downstream simulators and gym.

* feat(sim): Optimize support for downstream simulators and gym.

* docs: update docs

* update version
This commit is contained in:
Xinjie 2025-08-29 23:39:13 +08:00 committed by GitHub
parent e06900fe57
commit 0ca4a4894f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
22 changed files with 1303 additions and 474 deletions

View File

@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0). The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0).
## [0.1.4] - 2025-08-29
### Feature
- Supports importing the output of the EmbodiedGen Layout Generation module with a single `gym.make` command to create multiple parallel simulation environments and interactive 3D worlds.
- Supports coacd-based mesh decomposition for better efficiency and accuracy in physical simulations.
## [0.1.3] - 2025-08-19 ## [0.1.3] - 2025-08-19
### Feature ### Feature
- Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description. - Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description.

View File

@ -30,7 +30,7 @@
```sh ```sh
git clone https://github.com/HorizonRobotics/EmbodiedGen.git git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen cd EmbodiedGen
git checkout v0.1.3 git checkout v0.1.4
git submodule update --init --recursive --progress git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env. conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen conda activate embodiedgen
@ -67,7 +67,8 @@ You can choose between two backends for the GPT agent:
[![🤗 Hugging Face](https://img.shields.io/badge/🤗-Image_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems. [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Image_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems.
(HF space is a simplified demonstration. For the full functionality, please refer to `img3d-cli`.) (HF space is a simplified demonstration. For the full functionality, please refer to `img3d-cli`.)
<img src="apps/assets/image_to_3d.jpg" alt="Image to 3D" width="900">
<img src="apps/assets/image_to_3d.jpg" alt="Image to 3D" width="700">
### ☁️ Service ### ☁️ Service
Run the image-to-3D generation service locally. Run the image-to-3D generation service locally.
@ -95,7 +96,7 @@ img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/examp
[![🤗 Hugging Face](https://img.shields.io/badge/🤗-Text_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to `text3d-cli`.) [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Text_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to `text3d-cli`.)
<img src="apps/assets/text_to_3d.jpg" alt="Text to 3D" width="900"> <img src="apps/assets/text_to_3d.jpg" alt="Text to 3D" width="700">
### ☁️ Service ### ☁️ Service
Deploy the text-to-3D generation service locally. Deploy the text-to-3D generation service locally.
@ -133,7 +134,7 @@ ps: models with more permissive licenses found in `embodied_gen/models/image_com
[![🤗 Hugging Face](https://img.shields.io/badge/🤗-Texture_Gen_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh. [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Texture_Gen_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh.
<img src="apps/assets/texture_gen.jpg" alt="Texture Gen" width="900"> <img src="apps/assets/texture_gen.jpg" alt="Texture Gen" width="700">
### ☁️ Service ### ☁️ Service
@ -161,7 +162,7 @@ bash embodied_gen/scripts/texture_gen.sh \
<h2 id="3d-scene-generation">🌍 3D Scene Generation</h2> <h2 id="3d-scene-generation">🌍 3D Scene Generation</h2>
<img src="apps/assets/scene3d.gif" alt="scene3d" style="width: 640px;"> <img src="apps/assets/scene3d.gif" alt="scene3d" style="width: 600px;">
### ⚡ API ### ⚡ API
> Run `bash install.sh extra` to install additional requirements if you need to use `scene3d-cli`. > Run `bash install.sh extra` to install additional requirements if you need to use `scene3d-cli`.
@ -184,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
🚧 *Coming Soon* 🚧 *Coming Soon*
<img src="apps/assets/articulate.gif" alt="articulate" style="width: 480px;"> <img src="apps/assets/articulate.gif" alt="articulate" style="width: 430px;">
--- ---
@ -196,12 +197,12 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
<table> <table>
<tr> <tr>
<td><img src="apps/assets/layout1.gif" alt="layout1" width="360"/></td> <td><img src="apps/assets/layout1.gif" alt="layout1" width="320"/></td>
<td><img src="apps/assets/layout2.gif" alt="layout2" width="360"/></td> <td><img src="apps/assets/layout2.gif" alt="layout2" width="320"/></td>
</tr> </tr>
<tr> <tr>
<td><img src="apps/assets/layout3.gif" alt="layout3" width="360"/></td> <td><img src="apps/assets/layout3.gif" alt="layout3" width="320"/></td>
<td><img src="apps/assets/layout4.gif" alt="layout4" width="360"/></td> <td><img src="apps/assets/layout4.gif" alt="layout4" width="320"/></td>
</tr> </tr>
</table> </table>
@ -219,8 +220,8 @@ layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on
<table> <table>
<tr> <tr>
<td><img src="apps/assets/Iscene_demo1.gif" alt="Iscene_demo1" width="300"/></td> <td><img src="apps/assets/Iscene_demo1.gif" alt="Iscene_demo1" width="234"/></td>
<td><img src="apps/assets/Iscene_demo2.gif" alt="Iscene_demo2" width="450"/></td> <td><img src="apps/assets/Iscene_demo2.gif" alt="Iscene_demo2" width="350"/></td>
</tr> </tr>
</table> </table>
@ -233,7 +234,7 @@ CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log & --output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &
``` ```
Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.) Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes.
```sh ```sh
python embodied_gen/scripts/compose_layout.py \ python embodied_gen/scripts/compose_layout.py \
--layout_path "outputs/layouts_gens/task_0000/layout.json" \ --layout_path "outputs/layouts_gens/task_0000/layout.json" \
@ -243,13 +244,29 @@ python embodied_gen/scripts/compose_layout.py \
We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates). We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).
```sh ```sh
sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \ sim-cli --layout_path "outputs/layouts_gen/task_0000/layout.json" \
--output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka" --output_dir "outputs/layouts_gen/task_0000/sapien_render" --insert_robot
```
Example: generate multiple parallel simulation envs with `gym.make` and record sensor and trajectory data.
<table>
<tr>
<td><img src="apps/assets/parallel_sim.gif" alt="parallel_sim1" width="290"/></td>
<td><img src="apps/assets/parallel_sim2.gif" alt="parallel_sim2" width="290"/></td>
</tr>
</table>
```sh
python embodied_gen/scripts/parallel_sim.py \
--layout_file "outputs/layouts_gen/task_0000/layout.json" \
--output_dir "outputs/parallel_sim/task_0000" \
--num_envs 20
``` ```
### 🖼️ Real-to-Sim Digital Twin ### 🖼️ Real-to-Sim Digital Twin
<img src="apps/assets/real2sim_mujoco.gif" alt="real2sim_mujoco" style="width: 512px;"> <img src="apps/assets/real2sim_mujoco.gif" alt="real2sim_mujoco" width="400">
--- ---
@ -280,7 +297,7 @@ If you use EmbodiedGen in your research or projects, please cite:
## 🙌 Acknowledgement ## 🙌 Acknowledgement
EmbodiedGen builds upon the following amazing projects and models: EmbodiedGen builds upon the following amazing projects and models:
🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) 🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 🌟 [ManiSkill](https://github.com/haosulab/ManiSkill)
--- ---

Binary file not shown.

After

Width:  |  Height:  |  Size: 897 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 711 KiB

View File

@ -547,7 +547,9 @@ def extract_urdf(
# Convert to URDF and recover attrs by GPT. # Convert to URDF and recover attrs by GPT.
filename = "sample" filename = "sample"
urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4) urdf_convertor = URDFGenerator(
GPT_CLIENT, render_view_num=4, decompose_convex=True
)
asset_attrs = { asset_attrs = {
"version": VERSION, "version": VERSION,
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply", "gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",

View File

@ -0,0 +1,161 @@
# Project EmbodiedGen
#
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License.
import logging
import multiprocessing as mp
import os
import coacd
import trimesh
logger = logging.getLogger(__name__)
__all__ = [
"decompose_convex_coacd",
"decompose_convex_mesh",
"decompose_convex_process",
]
def decompose_convex_coacd(
filename: str, outfile: str, params: dict, verbose: bool = False
) -> None:
coacd.set_log_level("info" if verbose else "warn")
mesh = trimesh.load(filename, force="mesh")
mesh = coacd.Mesh(mesh.vertices, mesh.faces)
result = coacd.run_coacd(mesh, **params)
combined = sum([trimesh.Trimesh(*m) for m in result])
combined.export(outfile)
def decompose_convex_mesh(
filename: str,
outfile: str,
threshold: float = 0.05,
max_convex_hull: int = -1,
preprocess_mode: str = "auto",
preprocess_resolution: int = 30,
resolution: int = 2000,
mcts_nodes: int = 20,
mcts_iterations: int = 150,
mcts_max_depth: int = 3,
pca: bool = False,
merge: bool = True,
seed: int = 0,
verbose: bool = False,
) -> str:
"""Decompose a mesh into convex parts using the CoACD algorithm."""
coacd.set_log_level("info" if verbose else "warn")
if os.path.exists(outfile):
logger.warning(f"Output file {outfile} already exists, removing it.")
os.remove(outfile)
params = dict(
threshold=threshold,
max_convex_hull=max_convex_hull,
preprocess_mode=preprocess_mode,
preprocess_resolution=preprocess_resolution,
resolution=resolution,
mcts_nodes=mcts_nodes,
mcts_iterations=mcts_iterations,
mcts_max_depth=mcts_max_depth,
pca=pca,
merge=merge,
seed=seed,
)
try:
decompose_convex_coacd(filename, outfile, params, verbose)
if os.path.exists(outfile):
return outfile
except Exception as e:
if verbose:
print(f"Decompose convex first attempt failed: {e}.")
if preprocess_mode != "on":
try:
params["preprocess_mode"] = "on"
decompose_convex_coacd(filename, outfile, params, verbose)
if os.path.exists(outfile):
return outfile
except Exception as e:
if verbose:
print(
f"Decompose convex second attempt with preprocess_mode='on' failed: {e}"
)
raise RuntimeError(f"Convex decomposition failed on {filename}")
def decompose_convex_mp(
filename: str,
outfile: str,
threshold: float = 0.05,
max_convex_hull: int = -1,
preprocess_mode: str = "auto",
preprocess_resolution: int = 30,
resolution: int = 2000,
mcts_nodes: int = 20,
mcts_iterations: int = 150,
mcts_max_depth: int = 3,
pca: bool = False,
merge: bool = True,
seed: int = 0,
verbose: bool = False,
) -> str:
"""Decompose a mesh into convex parts using the CoACD algorithm in a separate process.
See https://simulately.wiki/docs/toolkits/ConvexDecomp for details.
"""
params = dict(
threshold=threshold,
max_convex_hull=max_convex_hull,
preprocess_mode=preprocess_mode,
preprocess_resolution=preprocess_resolution,
resolution=resolution,
mcts_nodes=mcts_nodes,
mcts_iterations=mcts_iterations,
mcts_max_depth=mcts_max_depth,
pca=pca,
merge=merge,
seed=seed,
)
ctx = mp.get_context("spawn")
p = ctx.Process(
target=decompose_convex_coacd,
args=(filename, outfile, params, verbose),
)
p.start()
p.join()
if p.exitcode == 0 and os.path.exists(outfile):
return outfile
if preprocess_mode != "on":
params["preprocess_mode"] = "on"
p = ctx.Process(
target=decompose_convex_coacd,
args=(filename, outfile, params, verbose),
)
p.start()
p.join()
if p.exitcode == 0 and os.path.exists(outfile):
return outfile
raise RuntimeError(f"Convex decomposition failed on {filename}")

View File

@ -16,13 +16,17 @@
import logging import logging
import multiprocessing as mp
import os
from typing import Tuple, Union from typing import Tuple, Union
import coacd
import igraph import igraph
import numpy as np import numpy as np
import pyvista as pv import pyvista as pv
import spaces import spaces
import torch import torch
import trimesh
import utils3d import utils3d
from pymeshfix import _meshfix from pymeshfix import _meshfix
from tqdm import tqdm from tqdm import tqdm
@ -33,7 +37,9 @@ logging.basicConfig(
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
__all__ = ["MeshFixer"] __all__ = [
"MeshFixer",
]
def _radical_inverse(base, n): def _radical_inverse(base, n):

View File

@ -0,0 +1,389 @@
# Project EmbodiedGen
#
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License.
import json
import os
from copy import deepcopy
import numpy as np
import sapien
import torch
import torchvision.transforms as transforms
from mani_skill.envs.sapien_env import BaseEnv
from mani_skill.sensors.camera import CameraConfig
from mani_skill.utils import sapien_utils
from mani_skill.utils.building import actors
from mani_skill.utils.registration import register_env
from mani_skill.utils.structs.actor import Actor
from mani_skill.utils.structs.pose import Pose
from mani_skill.utils.structs.types import (
GPUMemoryConfig,
SceneConfig,
SimConfig,
)
from mani_skill.utils.visualization.misc import tile_images
from tqdm import tqdm
from embodied_gen.models.gs_model import GaussianOperator
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
from embodied_gen.utils.geometry import bfs_placement, quaternion_multiply
from embodied_gen.utils.log import logger
from embodied_gen.utils.process_media import alpha_blend_rgba
from embodied_gen.utils.simulation import (
SIM_COORD_ALIGN,
load_assets_from_layout_file,
)
__all__ = ["PickEmbodiedGen"]
@register_env("PickEmbodiedGen-v1", max_episode_steps=100)
class PickEmbodiedGen(BaseEnv):
SUPPORTED_ROBOTS = ["panda", "panda_wristcam", "fetch"]
goal_thresh = 0.0
def __init__(
self,
*args,
robot_uids: str | list[str] = "panda",
robot_init_qpos_noise: float = 0.02,
num_envs: int = 1,
reconfiguration_freq: int = None,
**kwargs,
):
self.robot_init_qpos_noise = robot_init_qpos_noise
if reconfiguration_freq is None:
if num_envs == 1:
reconfiguration_freq = 1
else:
reconfiguration_freq = 0
# Init params from kwargs.
layout_file = kwargs.pop("layout_file", None)
replace_objs = kwargs.pop("replace_objs", True)
self.enable_grasp = kwargs.pop("enable_grasp", False)
self.init_quat = kwargs.pop("init_quat", [0.7071, 0, 0, 0.7071])
# Add small offset in z-axis to avoid collision.
self.objs_z_offset = kwargs.pop("objs_z_offset", 0.002)
self.robot_z_offset = kwargs.pop("robot_z_offset", 0.002)
self.layouts = self.init_env_layouts(
layout_file, num_envs, replace_objs
)
self.robot_pose = self.compute_robot_init_pose(
self.layouts, num_envs, self.robot_z_offset
)
self.env_actors = dict()
self.image_transform = transforms.PILToTensor()
super().__init__(
*args,
robot_uids=robot_uids,
reconfiguration_freq=reconfiguration_freq,
num_envs=num_envs,
**kwargs,
)
self.bg_images = dict()
if self.render_mode == "hybrid":
self.bg_images = self.render_gs3d_images(
self.layouts, num_envs, self.init_quat
)
@staticmethod
def init_env_layouts(
layout_file: str, num_envs: int, replace_objs: bool
) -> list[LayoutInfo]:
layout = LayoutInfo.from_dict(json.load(open(layout_file, "r")))
layouts = []
for env_idx in range(num_envs):
if replace_objs and env_idx > 0:
layout = bfs_placement(deepcopy(layout))
layouts.append(layout)
return layouts
@staticmethod
def compute_robot_init_pose(
layouts: list[LayoutInfo], num_envs: int, z_offset: float = 0.0
) -> list[list[float]]:
robot_pose = []
for env_idx in range(num_envs):
layout = layouts[env_idx]
robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
x, y, z, qx, qy, qz, qw = layout.position[robot_node]
robot_pose.append([x, y, z + z_offset, qw, qx, qy, qz])
return robot_pose
@property
def _default_sim_config(self):
return SimConfig(
scene_config=SceneConfig(
solver_position_iterations=30,
# contact_offset=0.04,
# rest_offset=0.001,
),
# sim_freq=200,
control_freq=50,
gpu_memory_config=GPUMemoryConfig(
max_rigid_contact_count=2**20, max_rigid_patch_count=2**19
),
)
@property
def _default_sensor_configs(self):
pose = sapien_utils.look_at(eye=[0.3, 0, 0.6], target=[-0.1, 0, 0.1])
return [
CameraConfig("base_camera", pose, 128, 128, np.pi / 2, 0.01, 100)
]
@property
def _default_human_render_camera_configs(self):
pose = sapien_utils.look_at(
eye=[0.9, 0.0, 1.1], target=[0.0, 0.0, 0.9]
)
return CameraConfig(
"render_camera", pose, 256, 256, np.deg2rad(75), 0.01, 100
)
def _load_agent(self, options: dict):
super()._load_agent(options, sapien.Pose(p=[-10, 0, 10]))
def _load_scene(self, options: dict):
all_objects = []
logger.info(f"Loading assets and decomposition mesh collisions...")
for env_idx in range(self.num_envs):
env_actors = load_assets_from_layout_file(
self.scene,
self.layouts[env_idx],
z_offset=self.objs_z_offset,
init_quat=self.init_quat,
env_idx=env_idx,
)
self.env_actors[f"env{env_idx}"] = env_actors
all_objects.extend(env_actors.values())
self.obj = all_objects[-1]
for obj in all_objects:
self.remove_from_state_dict_registry(obj)
self.all_objects = Actor.merge(all_objects, name="all_objects")
self.add_to_state_dict_registry(self.all_objects)
self.goal_site = actors.build_sphere(
self.scene,
radius=self.goal_thresh,
color=[0, 1, 0, 0],
name="goal_site",
body_type="kinematic",
add_collision=False,
initial_pose=sapien.Pose(),
)
self._hidden_objects.append(self.goal_site)
def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
with torch.device(self.device):
b = len(env_idx)
goal_xyz = torch.zeros((b, 3))
goal_xyz[:, :2] = torch.rand((b, 2)) * 0.2 - 0.1
self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))
qpos = np.array(
[
0.0,
np.pi / 8,
0,
-np.pi * 3 / 8,
0,
np.pi * 3 / 4,
np.pi / 4,
0.04,
0.04,
]
)
qpos = (
np.random.normal(
0, self.robot_init_qpos_noise, (self.num_envs, len(qpos))
)
+ qpos
)
qpos[:, -2:] = 0.04
self.agent.robot.set_root_pose(np.array(self.robot_pose))
self.agent.reset(qpos)
self.agent.init_qpos = qpos
self.agent.controller.controllers["gripper"].reset()
def render_gs3d_images(
self, layouts: list[LayoutInfo], num_envs: int, init_quat: list[float]
) -> dict[str, np.ndarray]:
sim_coord_align = (
torch.tensor(SIM_COORD_ALIGN).to(torch.float32).to(self.device)
)
cameras = self.scene.sensors.copy()
cameras.update(self.scene.human_render_cameras)
bg_node = layouts[0].relation[Scene3DItemEnum.BACKGROUND.value]
gs_path = os.path.join(layouts[0].assets[bg_node], "gs_model.ply")
raw_gs: GaussianOperator = GaussianOperator.load_from_ply(gs_path)
bg_images = dict()
for env_idx in tqdm(range(num_envs), desc="Pre-rendering Background"):
layout = layouts[env_idx]
x, y, z, qx, qy, qz, qw = layout.position[bg_node]
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
init_pose = torch.tensor([x, y, z, qx, qy, qz, qw])
gs_model = raw_gs.get_gaussians(instance_pose=init_pose)
for key in cameras:
camera = cameras[key]
Ks = camera.camera.get_intrinsic_matrix() # (n_env, 3, 3)
c2w = camera.camera.get_model_matrix() # (n_env, 4, 4)
result = gs_model.render(
c2w[env_idx] @ sim_coord_align,
Ks[env_idx],
image_width=camera.config.width,
image_height=camera.config.height,
)
bg_images[f"{key}-env{env_idx}"] = result.rgb[..., ::-1]
return bg_images
def render(self):
if self.render_mode is None:
raise RuntimeError("render_mode is not set.")
if self.render_mode == "human":
return self.render_human()
elif self.render_mode == "rgb_array":
res = self.render_rgb_array()
return res
elif self.render_mode == "sensors":
res = self.render_sensors()
return res
elif self.render_mode == "all":
return self.render_all()
elif self.render_mode == "hybrid":
return self.hybrid_render()
else:
raise NotImplementedError(
f"Unsupported render mode {self.render_mode}."
)
def render_rgb_array(
self, camera_name: str = None, return_alpha: bool = False
):
for obj in self._hidden_objects:
obj.show_visual()
self.scene.update_render(
update_sensors=False, update_human_render_cameras=True
)
images = []
render_images = self.scene.get_human_render_camera_images(
camera_name, return_alpha
)
for image in render_images.values():
images.append(image)
if len(images) == 0:
return None
if len(images) == 1:
return images[0]
for obj in self._hidden_objects:
obj.hide_visual()
return tile_images(images)
def render_sensors(self):
images = []
sensor_images = self.get_sensor_images()
for image in sensor_images.values():
for img in image.values():
images.append(img)
return tile_images(images)
def hybrid_render(self):
fg_images = self.render_rgb_array(
return_alpha=True
) # (n_env, h, w, 3)
images = []
for key in self.bg_images:
if "render_camera" not in key:
continue
env_idx = int(key.split("-env")[-1])
rgba = alpha_blend_rgba(
fg_images[env_idx].cpu().numpy(), self.bg_images[key]
)
images.append(self.image_transform(rgba))
images = torch.stack(images, dim=0)
images = images.permute(0, 2, 3, 1)
return images[..., :3]
def evaluate(self):
obj_to_goal_pos = (
self.obj.pose.p
) # self.goal_site.pose.p - self.obj.pose.p
is_obj_placed = (
torch.linalg.norm(obj_to_goal_pos, axis=1) <= self.goal_thresh
)
is_grasped = self.agent.is_grasping(self.obj)
is_robot_static = self.agent.is_static(0.2)
return dict(
is_grasped=is_grasped,
obj_to_goal_pos=obj_to_goal_pos,
is_obj_placed=is_obj_placed,
is_robot_static=is_robot_static,
is_grasping=self.agent.is_grasping(self.obj),
success=torch.logical_and(is_obj_placed, is_robot_static),
)
def _get_obs_extra(self, info: dict):
return dict()
def compute_dense_reward(self, obs: any, action: torch.Tensor, info: dict):
tcp_to_obj_dist = torch.linalg.norm(
self.obj.pose.p - self.agent.tcp.pose.p, axis=1
)
reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
reward = reaching_reward
is_grasped = info["is_grasped"]
reward += is_grasped
# obj_to_goal_dist = torch.linalg.norm(
# self.goal_site.pose.p - self.obj.pose.p, axis=1
# )
obj_to_goal_dist = torch.linalg.norm(
self.obj.pose.p - self.obj.pose.p, axis=1
)
place_reward = 1 - torch.tanh(5 * obj_to_goal_dist)
reward += place_reward * is_grasped
reward += info["is_obj_placed"] * is_grasped
static_reward = 1 - torch.tanh(
5
* torch.linalg.norm(self.agent.robot.get_qvel()[..., :-2], axis=1)
)
reward += static_reward * info["is_obj_placed"] * is_grasped
reward[info["success"]] = 6
return reward
def compute_normalized_dense_reward(
self, obs: any, action: torch.Tensor, info: dict
):
return self.compute_dense_reward(obs=obs, action=action, info=info) / 6

View File

@ -137,7 +137,7 @@ def entrypoint() -> None:
sim_cli( sim_cli(
layout_path=layout_path, layout_path=layout_path,
output_dir=output_root, output_dir=output_root,
robot_name="franka" if args.insert_robot else None, insert_robot=args.insert_robot,
) )
torch.cuda.empty_cache() torch.cuda.empty_cache()

View File

@ -58,7 +58,7 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "false"
os.environ["SPCONV_ALGO"] = "native" os.environ["SPCONV_ALGO"] = "native"
random.seed(0) random.seed(0)
logger.info("Loading Models...") logger.info("Loading Image3D Models...")
DELIGHT = DelightingModel() DELIGHT = DelightingModel()
IMAGESR_MODEL = ImageRealESRGAN(outscale=4) IMAGESR_MODEL = ImageRealESRGAN(outscale=4)
RBG_REMOVER = RembgRemover() RBG_REMOVER = RembgRemover()
@ -107,6 +107,7 @@ def parse_args():
type=int, type=int,
default=2, default=2,
) )
parser.add_argument("--disable_decompose_convex", action="store_true")
args, unknown = parser.parse_known_args() args, unknown = parser.parse_known_args()
return args return args
@ -251,7 +252,11 @@ def entrypoint(**kwargs):
mesh_glb_path = os.path.join(output_root, f"{filename}.glb") mesh_glb_path = os.path.join(output_root, f"{filename}.glb")
mesh.export(mesh_glb_path) mesh.export(mesh_glb_path)
urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4) urdf_convertor = URDFGenerator(
GPT_CLIENT,
render_view_num=4,
decompose_convex=not args.disable_decompose_convex,
)
asset_attrs = { asset_attrs = {
"version": VERSION, "version": VERSION,
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply", "gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",

View File

@ -0,0 +1,148 @@
# Project EmbodiedGen
#
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License.
from embodied_gen.utils.monkey_patches import monkey_patch_maniskill
monkey_patch_maniskill()
import json
from collections import defaultdict
from dataclasses import dataclass
from typing import Literal
import gymnasium as gym
import numpy as np
import torch
import tyro
from mani_skill.utils.wrappers import RecordEpisode
from tqdm import tqdm
import embodied_gen.envs.pick_embodiedgen
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
from embodied_gen.utils.log import logger
from embodied_gen.utils.simulation import FrankaPandaGrasper
@dataclass
class ParallelSimConfig:
"""CLI parameters for Parallel Sapien simulation."""
# Environment configuration
layout_file: str
"""Path to the layout JSON file"""
output_dir: str
"""Directory to save recorded videos"""
gym_env_name: str = "PickEmbodiedGen-v1"
"""Name of the Gym environment to use"""
num_envs: int = 4
"""Number of parallel environments"""
render_mode: Literal["rgb_array", "hybrid"] = "hybrid"
"""Rendering mode: rgb_array or hybrid"""
enable_shadow: bool = True
"""Whether to enable shadows in rendering"""
control_mode: str = "pd_joint_pos"
"""Control mode for the agent"""
# Recording configuration
max_steps_per_video: int = 1000
"""Maximum steps to record per video"""
save_trajectory: bool = False
"""Whether to save trajectory data"""
# Simulation parameters
seed: int = 0
"""Random seed for environment reset"""
warmup_steps: int = 50
"""Number of warmup steps before action computation"""
reach_target_only: bool = True
"""Whether to only reach target without full action"""
def entrypoint(**kwargs):
if kwargs is None or len(kwargs) == 0:
cfg = tyro.cli(ParallelSimConfig)
else:
cfg = ParallelSimConfig(**kwargs)
env = gym.make(
cfg.gym_env_name,
num_envs=cfg.num_envs,
render_mode=cfg.render_mode,
enable_shadow=cfg.enable_shadow,
layout_file=cfg.layout_file,
control_mode=cfg.control_mode,
)
env = RecordEpisode(
env,
cfg.output_dir,
max_steps_per_video=cfg.max_steps_per_video,
save_trajectory=cfg.save_trajectory,
)
env.reset(seed=cfg.seed)
default_action = env.unwrapped.agent.init_qpos[:, :8]
for _ in tqdm(range(cfg.warmup_steps), desc="SIM Warmup"):
# action = env.action_space.sample() # Random action
obs, reward, terminated, truncated, info = env.step(default_action)
grasper = FrankaPandaGrasper(
env.unwrapped.agent,
env.unwrapped.sim_config.control_freq,
)
layout_data = LayoutInfo.from_dict(json.load(open(cfg.layout_file, "r")))
actions = defaultdict(list)
# Plan Grasp reach pose for each manipulated object in each env.
for env_idx in range(env.num_envs):
actors = env.unwrapped.env_actors[f"env{env_idx}"]
for node in layout_data.relation[
Scene3DItemEnum.MANIPULATED_OBJS.value
]:
action = grasper.compute_grasp_action(
actor=actors[node]._objs[0],
reach_target_only=True,
env_idx=env_idx,
)
actions[node].append(action)
# Excute the planned actions for each manipulated object in each env.
for node in actions:
max_env_steps = 0
for env_idx in range(env.num_envs):
if actions[node][env_idx] is None:
continue
max_env_steps = max(max_env_steps, len(actions[node][env_idx]))
action_tensor = np.ones(
(max_env_steps, env.num_envs, env.action_space.shape[-1])
)
action_tensor *= default_action[None, ...]
for env_idx in range(env.num_envs):
action = actions[node][env_idx]
if action is None:
continue
action_tensor[: len(action), env_idx, :] = action
for step in tqdm(range(max_env_steps), desc=f"Grasping: {node}"):
action = torch.Tensor(action_tensor[step]).to(env.unwrapped.device)
env.unwrapped.agent.set_action(action)
obs, reward, terminated, truncated, info = env.step(action)
env.close()
logger.info(f"Results saved in {cfg.output_dir}")
if __name__ == "__main__":
entrypoint()

View File

@ -35,6 +35,9 @@ from embodied_gen.utils.simulation import (
SIM_COORD_ALIGN, SIM_COORD_ALIGN,
FrankaPandaGrasper, FrankaPandaGrasper,
SapienSceneManager, SapienSceneManager,
load_assets_from_layout_file,
load_mani_skill_robot,
render_images,
) )
@ -50,8 +53,8 @@ class SapienSimConfig:
default_factory=lambda: [0.7071, 0, 0, 0.7071] default_factory=lambda: [0.7071, 0, 0, 0.7071]
) # xyzw ) # xyzw
device: str = "cuda" device: str = "cuda"
robot_name: str | None = None control_freq: int = 50
control_freq: int = 40 insert_robot: bool = False
# Camera settings. # Camera settings.
render_interval: int = 10 render_interval: int = 10
num_cameras: int = 3 num_cameras: int = 3
@ -91,48 +94,41 @@ def entrypoint(**kwargs):
layout_data = json.load(f) layout_data = json.load(f)
layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data) layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data)
scene_manager.load_assets_from_layout_file( actors = load_assets_from_layout_file(
scene_manager.scene,
layout_data, layout_data,
cfg.z_offset, cfg.z_offset,
cfg.init_quat, cfg.init_quat,
cfg.robot_name, )
cfg.control_freq, agent = load_mani_skill_robot(
scene_manager.scene, layout_data, cfg.control_freq
) )
frames = defaultdict(list) frames = defaultdict(list)
image_cnt = 0 image_cnt = 0
for step in tqdm(range(cfg.sim_step), desc="Simulation"): for step in tqdm(range(cfg.sim_step), desc="Simulation"):
scene_manager.scene.step() scene_manager.scene.step()
if cfg.robot_name is not None: agent.reset(agent.init_qpos)
scene_manager.robot.reset(scene_manager.robot.init_qpos)
if step % cfg.render_interval != 0: if step % cfg.render_interval != 0:
continue continue
scene_manager.scene.update_render() scene_manager.scene.update_render()
image_cnt += 1 image_cnt += 1
for camera in scene_manager.cameras: for camera in scene_manager.cameras:
camera.take_picture() camera.take_picture()
images = scene_manager.render_images( images = render_images(camera, cfg.render_keys)
camera, render_keys=cfg.render_keys
)
frames[camera.name].append(images) frames[camera.name].append(images)
grasp_frames = defaultdict(dict) actions = dict()
if cfg.robot_name is not None: if cfg.insert_robot:
sim_steps_per_control = cfg.sim_freq // cfg.control_freq
control_timestep = 1.0 / cfg.control_freq
grasper = FrankaPandaGrasper( grasper = FrankaPandaGrasper(
scene_manager.robot, agent,
scene_manager, cfg.control_freq,
sim_steps_per_control,
control_timestep,
) )
for node in layout_data.relation[ for node in layout_data.relation[
Scene3DItemEnum.MANIPULATED_OBJS.value Scene3DItemEnum.MANIPULATED_OBJS.value
]: ]:
grasp_frames[node] = grasper.render_grasp( actions[node] = grasper.compute_grasp_action(
scene_manager.actors[node], actor=actors[node], reach_target_only=True
scene_manager.cameras,
cfg.render_keys,
) )
if "Foreground" not in cfg.render_keys: if "Foreground" not in cfg.render_keys:
@ -160,22 +156,36 @@ def entrypoint(**kwargs):
bg_images[camera.name] = result.rgb[..., ::-1] bg_images[camera.name] = result.rgb[..., ::-1]
video_frames = [] video_frames = []
for camera in scene_manager.cameras: for idx, camera in enumerate(scene_manager.cameras):
# Scene rendering # Scene rendering
for step in tqdm(range(image_cnt), desc="Rendering"): if idx == 0:
for step in range(image_cnt):
rgba = alpha_blend_rgba( rgba = alpha_blend_rgba(
frames[camera.name][step]["Foreground"], frames[camera.name][step]["Foreground"],
bg_images[camera.name], bg_images[camera.name],
) )
video_frames.append(np.array(rgba)) video_frames.append(np.array(rgba))
# Grasp rendering # Grasp rendering
for node in grasp_frames: for node in actions:
for frame in grasp_frames[node][camera.name]: if actions[node] is None:
continue
for action in tqdm(actions[node]):
grasp_frames = scene_manager.step_action(
agent,
torch.Tensor(action[None, ...]),
scene_manager.cameras,
cfg.render_keys,
sim_steps_per_control=cfg.sim_freq // cfg.control_freq,
)
rgba = alpha_blend_rgba( rgba = alpha_blend_rgba(
frame["Foreground"], bg_images[camera.name] grasp_frames[camera.name][0]["Foreground"],
bg_images[camera.name],
) )
video_frames.append(np.array(rgba)) video_frames.append(np.array(rgba))
agent.reset(agent.init_qpos)
os.makedirs(cfg.output_dir, exist_ok=True) os.makedirs(cfg.output_dir, exist_ok=True)
video_path = f"{cfg.output_dir}/Iscene.mp4" video_path = f"{cfg.output_dir}/Iscene.mp4"
imageio.mimsave(video_path, video_frames, fps=30) imageio.mimsave(video_path, video_frames, fps=30)

View File

@ -42,7 +42,7 @@ from embodied_gen.validators.quality_checkers import (
os.environ["TOKENIZERS_PARALLELISM"] = "false" os.environ["TOKENIZERS_PARALLELISM"] = "false"
random.seed(0) random.seed(0)
logger.info("Loading Models...") logger.info("Loading TEXT2IMG_MODEL...")
SEMANTIC_CHECKER = SemanticConsistChecker(GPT_CLIENT) SEMANTIC_CHECKER = SemanticConsistChecker(GPT_CLIENT)
SEG_CHECKER = ImageSegChecker(GPT_CLIENT) SEG_CHECKER = ImageSegChecker(GPT_CLIENT)
TXTGEN_CHECKER = TextGenAlignChecker(GPT_CLIENT) TXTGEN_CHECKER = TextGenAlignChecker(GPT_CLIENT)
@ -170,6 +170,7 @@ def text_to_3d(**kwargs) -> dict:
seed=random.randint(0, 100000) if seed_3d is None else seed_3d, seed=random.randint(0, 100000) if seed_3d is None else seed_3d,
n_retry=args.n_asset_retry, n_retry=args.n_asset_retry,
keep_intermediate=args.keep_intermediate, keep_intermediate=args.keep_intermediate,
disable_decompose_convex=args.disable_decompose_convex,
) )
mesh_path = f"{node_save_dir}/result/mesh/{save_node}.obj" mesh_path = f"{node_save_dir}/result/mesh/{save_node}.obj"
image_path = render_asset3d( image_path = render_asset3d(
@ -270,6 +271,7 @@ def parse_args():
help="Random seed for 3D generation", help="Random seed for 3D generation",
) )
parser.add_argument("--keep_intermediate", action="store_true") parser.add_argument("--keep_intermediate", action="store_true")
parser.add_argument("--disable_decompose_convex", action="store_true")
args, unknown = parser.parse_known_args() args, unknown = parser.parse_known_args()

View File

@ -18,6 +18,7 @@ import os
import sys import sys
import zipfile import zipfile
import numpy as np
import torch import torch
from huggingface_hub import hf_hub_download from huggingface_hub import hf_hub_download
from omegaconf import OmegaConf from omegaconf import OmegaConf
@ -150,3 +151,68 @@ def monkey_patch_pano2room():
self.inpaint_pipe = pipe self.inpaint_pipe = pipe
SDFTInpainter.__init__ = patched_sd_inpaint_init SDFTInpainter.__init__ = patched_sd_inpaint_init
def monkey_patch_maniskill():
from mani_skill.envs.scene import ManiSkillScene
def get_sensor_images(
self, obs: dict[str, any]
) -> dict[str, dict[str, torch.Tensor]]:
sensor_data = dict()
for name, sensor in self.sensors.items():
sensor_data[name] = sensor.get_images(obs[name])
return sensor_data
def get_human_render_camera_images(
self, camera_name: str = None, return_alpha: bool = False
) -> dict[str, torch.Tensor]:
def get_rgba_tensor(camera, return_alpha):
color = camera.get_obs(
rgb=True, depth=False, segmentation=False, position=False
)["rgb"]
if return_alpha:
seg_labels = camera.get_obs(
rgb=False, depth=False, segmentation=True, position=False
)["segmentation"]
masks = np.where((seg_labels.cpu() > 0), 255, 0).astype(
np.uint8
)
masks = torch.tensor(masks).to(color.device)
color = torch.concat([color, masks], dim=-1)
return color
image_data = dict()
if self.gpu_sim_enabled:
if self.parallel_in_single_scene:
for name, camera in self.human_render_cameras.items():
camera.camera._render_cameras[0].take_picture()
rgba = get_rgba_tensor(camera, return_alpha)
image_data[name] = rgba
else:
for name, camera in self.human_render_cameras.items():
if camera_name is not None and name != camera_name:
continue
assert camera.config.shader_config.shader_pack not in [
"rt",
"rt-fast",
"rt-med",
], "ray tracing shaders do not work with parallel rendering"
camera.capture()
rgba = get_rgba_tensor(camera, return_alpha)
image_data[name] = rgba
else:
for name, camera in self.human_render_cameras.items():
if camera_name is not None and name != camera_name:
continue
camera.capture()
rgba = get_rgba_tensor(camera, return_alpha)
image_data[name] = rgba
return image_data
ManiSkillScene.get_sensor_images = get_sensor_images
ManiSkillScene.get_human_render_camera_images = (
get_human_render_camera_images
)

View File

@ -14,7 +14,7 @@
# implied. See the License for the specific language governing # implied. See the License for the specific language governing
# permissions and limitations under the License. # permissions and limitations under the License.
import json
import logging import logging
import os import os
import xml.etree.ElementTree as ET import xml.etree.ElementTree as ET
@ -24,19 +24,20 @@ from typing import Literal
import mplib import mplib
import numpy as np import numpy as np
import sapien.core as sapien import sapien.core as sapien
import sapien.physx as physx
import torch import torch
from PIL import Image, ImageColor from mani_skill.agents.base_agent import BaseAgent
from scipy.spatial.transform import Rotation as R from mani_skill.envs.scene import ManiSkillScene
from tqdm import tqdm
from embodied_gen.data.utils import DiffrastRender
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
from embodied_gen.utils.geometry import quaternion_multiply
logger = logging.getLogger(__name__)
from mani_skill.examples.motionplanning.panda.utils import ( from mani_skill.examples.motionplanning.panda.utils import (
compute_grasp_info_by_obb, compute_grasp_info_by_obb,
) )
from mani_skill.utils.geometry.trimesh_utils import get_component_mesh from mani_skill.utils.geometry.trimesh_utils import get_component_mesh
from PIL import Image, ImageColor
from scipy.spatial.transform import Rotation as R
from embodied_gen.data.utils import DiffrastRender
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
from embodied_gen.utils.geometry import quaternion_multiply
from embodied_gen.utils.log import logger
COLORMAP = list(set(ImageColor.colormap.values())) COLORMAP = list(set(ImageColor.colormap.values()))
COLOR_PALETTE = np.array( COLOR_PALETTE = np.array(
@ -54,10 +55,267 @@ SIM_COORD_ALIGN = np.array(
__all__ = [ __all__ = [
"SIM_COORD_ALIGN", "SIM_COORD_ALIGN",
"FrankaPandaGrasper", "FrankaPandaGrasper",
"SimpleGrasper", "load_assets_from_layout_file",
"load_mani_skill_robot",
"render_images",
] ]
def load_actor_from_urdf(
scene: ManiSkillScene | sapien.Scene,
file_path: str,
pose: sapien.Pose,
env_idx: int = None,
use_static: bool = False,
update_mass: bool = False,
) -> sapien.pysapien.Entity:
tree = ET.parse(file_path)
root = tree.getroot()
node_name = root.get("name")
file_dir = os.path.dirname(file_path)
visual_mesh = root.find('.//visual/geometry/mesh')
visual_file = visual_mesh.get("filename")
visual_scale = visual_mesh.get("scale", "1.0 1.0 1.0")
visual_scale = np.array([float(x) for x in visual_scale.split()])
collision_mesh = root.find('.//collision/geometry/mesh')
collision_file = collision_mesh.get("filename")
collision_scale = collision_mesh.get("scale", "1.0 1.0 1.0")
collision_scale = np.array([float(x) for x in collision_scale.split()])
visual_file = os.path.join(file_dir, visual_file)
collision_file = os.path.join(file_dir, collision_file)
static_fric = root.find('.//collision/gazebo/mu1').text
dynamic_fric = root.find('.//collision/gazebo/mu2').text
material = physx.PhysxMaterial(
static_friction=np.clip(float(static_fric), 0.1, 0.7),
dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
restitution=0.05,
)
builder = scene.create_actor_builder()
body_type = "static" if use_static else "dynamic"
builder.set_physx_body_type(body_type)
builder.add_multiple_convex_collisions_from_file(
collision_file if body_type == "dynamic" else visual_file,
material=material,
scale=collision_scale if body_type == "dynamic" else visual_scale,
# decomposition="coacd",
# decomposition_params=dict(
# threshold=0.05, max_convex_hull=64, verbose=False
# ),
)
builder.add_visual_from_file(visual_file, scale=visual_scale)
builder.set_initial_pose(pose)
if isinstance(scene, ManiSkillScene) and env_idx is not None:
builder.set_scene_idxs([env_idx])
actor = builder.build(name=f"{node_name}-{env_idx}")
if update_mass and hasattr(actor.components[1], "mass"):
node_mass = float(root.find('.//inertial/mass').get("value"))
actor.components[1].set_mass(node_mass)
return actor
def load_assets_from_layout_file(
scene: ManiSkillScene | sapien.Scene,
layout: LayoutInfo | str,
z_offset: float = 0.0,
init_quat: list[float] = [0, 0, 0, 1],
env_idx: int = None,
) -> dict[str, sapien.pysapien.Entity]:
"""Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
Args:
scene (sapien.Scene | ManiSkillScene): The SAPIEN or ManiSkill scene to load assets into.
layout (LayoutInfo): The layout information data.
z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
env_idx (int): Environment index for multi-environment setup.
"""
if isinstance(layout, str) and layout.endswith(".json"):
layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
actors = dict()
for node in layout.assets:
file_dir = layout.assets[node]
file_name = f"{node.replace(' ', '_')}.urdf"
urdf_file = os.path.join(file_dir, file_name)
if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
continue
position = layout.position[node].copy()
if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
position[2] += z_offset
use_static = (
layout.relation.get(Scene3DItemEnum.CONTEXT.value, None) == node
)
# Combine initial quaternion with object quaternion
x, y, z, qx, qy, qz, qw = position
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
actor = load_actor_from_urdf(
scene,
urdf_file,
sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
env_idx,
use_static=use_static,
update_mass=False,
)
actors[node] = actor
return actors
def load_mani_skill_robot(
scene: sapien.Scene | ManiSkillScene,
layout: LayoutInfo | str,
control_freq: int = 20,
robot_init_qpos_noise: float = 0.0,
control_mode: str = "pd_joint_pos",
backend_str: tuple[str, str] = ("cpu", "gpu"),
) -> BaseAgent:
from mani_skill.agents import REGISTERED_AGENTS
from mani_skill.envs.scene import ManiSkillScene
from mani_skill.envs.utils.system.backend import (
parse_sim_and_render_backend,
)
if isinstance(layout, str) and layout.endswith(".json"):
layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
robot_name = layout.relation[Scene3DItemEnum.ROBOT.value]
x, y, z, qx, qy, qz, qw = layout.position[robot_name]
delta_z = 0.002 # Add small offset to avoid collision.
pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
if robot_name not in REGISTERED_AGENTS:
logger.warning(
f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}, use `panda` instead."
)
robot_name = "panda"
ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
backend = parse_sim_and_render_backend(*backend_str)
if isinstance(scene, sapien.Scene):
scene = ManiSkillScene([scene], device=backend_str[0], backend=backend)
robot = ROBOT_CLS(
scene=scene,
control_freq=control_freq,
control_mode=control_mode,
initial_pose=pose,
)
# Set robot init joint rad agree(joint0 to joint6 w 2 finger).
qpos = np.array(
[
0.0,
np.pi / 8,
0,
-np.pi * 3 / 8,
0,
np.pi * 3 / 4,
np.pi / 4,
0.04,
0.04,
]
)
qpos = (
np.random.normal(
0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
)
+ qpos
)
qpos[:, -2:] = 0.04
robot.reset(qpos)
robot.init_qpos = robot.robot.qpos
robot.controller.controllers["gripper"].reset()
return robot
def render_images(
camera: sapien.render.RenderCameraComponent,
render_keys: list[
Literal[
"Color",
"Segmentation",
"Normal",
"Mask",
"Depth",
"Foreground",
]
] = None,
) -> dict[str, Image.Image]:
"""Render images from a given sapien camera.
Args:
camera (sapien.render.RenderCameraComponent): The camera to render from.
render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
Returns:
Dict[str, Image.Image]: Dictionary of rendered images.
"""
if render_keys is None:
render_keys = [
"Color",
"Segmentation",
"Normal",
"Mask",
"Depth",
"Foreground",
]
results: dict[str, Image.Image] = {}
if "Color" in render_keys:
color = camera.get_picture("Color")
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
results["Color"] = Image.fromarray(color_rgb)
if "Mask" in render_keys:
alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
results["Mask"] = Image.fromarray(alpha)
if "Segmentation" in render_keys:
seg_labels = camera.get_picture("Segmentation")
label0 = seg_labels[..., 0].astype(np.uint8)
seg_color = COLOR_PALETTE[label0]
results["Segmentation"] = Image.fromarray(seg_color)
if "Foreground" in render_keys:
seg_labels = camera.get_picture("Segmentation")
label0 = seg_labels[..., 0]
mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
color = camera.get_picture("Color")
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
results["Foreground"] = Image.fromarray(foreground)
if "Normal" in render_keys:
normal = camera.get_picture("Normal")[..., :3]
normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
results["Normal"] = Image.fromarray(normal_img)
if "Depth" in render_keys:
position_map = camera.get_picture("Position")
depth = -position_map[..., 2]
alpha = torch.tensor(color[..., 3], dtype=torch.float32)
norm_depth = DiffrastRender.normalize_map_by_mask(
torch.tensor(depth), alpha
)
depth_img = (norm_depth * 255).to(torch.uint8).numpy()
results["Depth"] = Image.fromarray(depth_img)
return results
class SapienSceneManager: class SapienSceneManager:
"""A class to manage SAPIEN simulator.""" """A class to manage SAPIEN simulator."""
@ -81,11 +339,6 @@ class SapienSceneManager:
sapien.render.set_ray_tracing_path_depth(10) sapien.render.set_ray_tracing_path_depth(10)
sapien.render.set_ray_tracing_denoiser("oidn") sapien.render.set_ray_tracing_denoiser("oidn")
# TODO: Support GPU.
# import sapien.physx as physx
# physx.enable_gpu()
# scene = sapien.Scene(systems=[sapien.physx.PhysxGpuSystem(), sapien.render.RenderSystem()])
scene = sapien.Scene() scene = sapien.Scene()
scene.set_timestep(1 / self.sim_freq) scene.set_timestep(1 / self.sim_freq)
@ -114,12 +367,13 @@ class SapienSceneManager:
def step_action( def step_action(
self, self,
agent: BaseAgent,
action: torch.Tensor, action: torch.Tensor,
sim_steps_per_control: int,
cameras: list[sapien.render.RenderCameraComponent], cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[str], render_keys: list[str],
sim_steps_per_control: int = 1,
) -> dict: ) -> dict:
self.robot.set_action(action) agent.set_action(action)
frames = defaultdict(list) frames = defaultdict(list)
for _ in range(sim_steps_per_control): for _ in range(sim_steps_per_control):
self.scene.step() self.scene.step()
@ -127,125 +381,11 @@ class SapienSceneManager:
self.scene.update_render() self.scene.update_render()
for camera in cameras: for camera in cameras:
camera.take_picture() camera.take_picture()
images = self.render_images(camera, render_keys=render_keys) images = render_images(camera, render_keys=render_keys)
frames[camera.name].append(images) frames[camera.name].append(images)
return frames return frames
def load_exists_robot(
self,
robot_name: str,
pose: sapien.Pose,
control_freq: int = 20,
robot_init_qpos_noise: float = 0.02,
control_mode: str = "pd_joint_pos",
backend_str: tuple[str, str] = ("cpu", "gpu"),
):
from mani_skill.agents import REGISTERED_AGENTS
from mani_skill.envs.scene import ManiSkillScene
from mani_skill.envs.utils.system.backend import (
parse_sim_and_render_backend,
)
if robot_name == "franka":
robot_name = "panda" # remapping
if robot_name not in REGISTERED_AGENTS:
raise KeyError(
f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}."
)
ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
# backend = parse_sim_and_render_backend("physx_cuda", "gpu")
backend = parse_sim_and_render_backend(*backend_str)
scene = ManiSkillScene(
[self.scene], device=backend_str[0], backend=backend
)
robot = ROBOT_CLS(
scene=scene,
control_freq=control_freq,
control_mode=control_mode,
initial_pose=pose,
)
# Set robot init joint rad agree(joint0 to joint6 w 2 finger).
qpos = np.array(
[
0.0,
np.pi / 8,
0,
-np.pi * 3 / 8,
0,
np.pi * 3 / 4,
np.pi / 4,
0.04,
0.04,
]
)
qpos = (
np.random.normal(
0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
)
+ qpos
)
qpos[:, -2:] = 0.04
robot.reset(qpos)
robot.init_qpos = robot.robot.qpos
robot.controller.controllers["gripper"].reset()
return robot
def load_actor_from_urdf(
self,
file_path: str,
pose: sapien.Pose,
use_static: bool = False,
update_mass: bool = False,
) -> sapien.pysapien.Entity:
"""Load an actor from a URDF file into the scene.
Args:
file_path (str): Path to the URDF file.
pose (sapien.Pose): Pose of the actor.
use_static (bool): Whether to create a static actor.
update_mass (bool): Whether to update the actor's mass from the URDF.
Returns:
sapien.pysapien.Entity: The loaded actor.
"""
tree = ET.parse(file_path)
root = tree.getroot()
node_name = root.get("name")
file_dir = os.path.dirname(file_path)
visual_file = root.find('.//visual/geometry/mesh').get("filename")
collision_file = root.find('.//collision/geometry/mesh').get(
"filename"
)
visual_file = os.path.join(file_dir, visual_file)
collision_file = os.path.join(file_dir, collision_file)
static_fric = root.find('.//collision/gazebo/mu1').text
dynamic_fric = root.find('.//collision/gazebo/mu2').text
material = self.scene.create_physical_material(
static_friction=np.clip(float(static_fric), 0.1, 0.7),
dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
restitution=0.05,
)
builder = self.scene.create_actor_builder()
body_type = "static" if use_static else "dynamic"
builder.set_physx_body_type(body_type)
builder.add_multiple_convex_collisions_from_file(
collision_file, material=material
)
builder.add_visual_from_file(visual_file)
actor = builder.build(name=node_name)
actor.set_name(node_name)
actor.set_pose(pose)
if update_mass and hasattr(actor.components[1], "mass"):
node_mass = float(root.find('.//inertial/mass').get("value"))
actor.components[1].set_mass(node_mass)
return actor
def create_camera( def create_camera(
self, self,
cam_name: str, cam_name: str,
@ -344,179 +484,32 @@ class SapienSceneManager:
return self.cameras return self.cameras
def render_images(
self,
camera: sapien.render.RenderCameraComponent,
render_keys: list[
Literal[
"Color",
"Segmentation",
"Normal",
"Mask",
"Depth",
"Foreground",
]
] = None,
) -> dict[str, Image.Image]:
"""Render images from a given camera.
Args:
camera (sapien.render.RenderCameraComponent): The camera to render from.
render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
Returns:
Dict[str, Image.Image]: Dictionary of rendered images.
"""
if render_keys is None:
render_keys = [
"Color",
"Segmentation",
"Normal",
"Mask",
"Depth",
"Foreground",
]
results: dict[str, Image.Image] = {}
if "Color" in render_keys:
color = camera.get_picture("Color")
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
results["Color"] = Image.fromarray(color_rgb)
if "Mask" in render_keys:
alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
results["Mask"] = Image.fromarray(alpha)
if "Segmentation" in render_keys:
seg_labels = camera.get_picture("Segmentation")
label0 = seg_labels[..., 0].astype(np.uint8)
seg_color = self.COLOR_PALETTE[label0]
results["Segmentation"] = Image.fromarray(seg_color)
if "Foreground" in render_keys:
seg_labels = camera.get_picture("Segmentation")
label0 = seg_labels[..., 0]
mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
color = camera.get_picture("Color")
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
results["Foreground"] = Image.fromarray(foreground)
if "Normal" in render_keys:
normal = camera.get_picture("Normal")[..., :3]
normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
results["Normal"] = Image.fromarray(normal_img)
if "Depth" in render_keys:
position_map = camera.get_picture("Position")
depth = -position_map[..., 2]
alpha = torch.tensor(color[..., 3], dtype=torch.float32)
norm_depth = DiffrastRender.normalize_map_by_mask(
torch.tensor(depth), alpha
)
depth_img = (norm_depth * 255).to(torch.uint8).numpy()
results["Depth"] = Image.fromarray(depth_img)
return results
def load_assets_from_layout_file(
self,
layout: LayoutInfo,
z_offset: float = 0.0,
init_quat: list[float] = [0, 0, 0, 1],
robot_name: str = None,
control_freq: int = 40,
robot_init_qpos_noise: float = 0,
) -> None:
"""Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
Args:
layout (LayoutInfo): The layout information data.
z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
robot_name (str): Name of the robot to load, None not load robot.
control_freq (int): Control frequency for the robot, default is 40.
robot_init_qpos_noise (float): Noise to add to the robot's initial joint positions, default is 0.
"""
for node in layout.assets:
file_dir = layout.assets[node]
file_name = f"{node.replace(' ', '_')}.urdf"
urdf_file = os.path.join(file_dir, file_name)
if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
continue
position = layout.position[node].copy()
if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
position[2] += z_offset
use_static = (
layout.relation.get(Scene3DItemEnum.CONTEXT.value, None)
== node
)
# Combine initial quaternion with object quaternion
x, y, z, qx, qy, qz, qw = position
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
actor = self.load_actor_from_urdf(
urdf_file,
sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
use_static=use_static,
update_mass=False,
)
self.actors[node] = actor
if robot_name is not None:
robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
x, y, z, qx, qy, qz, qw = layout.position[robot_node]
delta_z = 0.002 # Add small offset to avoid collision.
pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
self.robot = self.load_exists_robot(
robot_name, pose, control_freq, robot_init_qpos_noise
)
return
class FrankaPandaGrasper(object): class FrankaPandaGrasper(object):
"""Grasper planner for Franka Panda robot arm.
Args:
robot: See from `mani_skill.agents.robots.panda.panda`.
scene: The `SapienSceneManager` object.
sim_steps_per_control: Number of simulation steps per control step.
control_timestep: Time step for planning.
joint_vel_limits: Velocity limit for all joints (default: 0.9)
joint_acc_limits: Acceleration limit for all joints (default: 0.9)
finger_length: Length of the robot's fingers (default: 0.025)
"""
def __init__( def __init__(
self, self,
robot, agent: BaseAgent,
scene: SapienSceneManager, control_freq: float,
sim_steps_per_control: int, joint_vel_limits: float = 2.0,
control_timestep: float, joint_acc_limits: float = 1.0,
joint_vel_limits: float = 0.9,
joint_acc_limits: float = 0.9,
finger_length: float = 0.025, finger_length: float = 0.025,
) -> None: ) -> None:
self.agent = robot self.agent = agent
self.robot = robot.robot self.robot = agent.robot
self.scene = scene self.control_freq = control_freq
self.sim_steps_per_control = sim_steps_per_control self.control_timestep = 1 / control_freq
self.control_timestep = control_timestep
self.joint_vel_limits = joint_vel_limits self.joint_vel_limits = joint_vel_limits
self.joint_acc_limits = joint_acc_limits self.joint_acc_limits = joint_acc_limits
self.finger_length = finger_length self.finger_length = finger_length
self.planner = self._setup_planner() self.planners = self._setup_planner()
def _setup_planner(self) -> mplib.Planner: def _setup_planner(self) -> mplib.Planner:
planners = []
for pose in self.robot.pose:
link_names = [link.get_name() for link in self.robot.get_links()] link_names = [link.get_name() for link in self.robot.get_links()]
joint_names = [ joint_names = [
joint.get_name() for joint in self.robot.get_active_joints() joint.get_name() for joint in self.robot.get_active_joints()
] ]
planner = mplib.Planner( planner = mplib.Planner(
urdf=self.agent.urdf_path, urdf=self.agent.urdf_path,
srdf=self.agent.urdf_path.replace(".urdf", ".srdf"), srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
@ -526,75 +519,35 @@ class FrankaPandaGrasper(object):
joint_vel_limits=np.ones(7) * self.joint_vel_limits, joint_vel_limits=np.ones(7) * self.joint_vel_limits,
joint_acc_limits=np.ones(7) * self.joint_acc_limits, joint_acc_limits=np.ones(7) * self.joint_acc_limits,
) )
planner.set_base_pose(self.robot.pose.raw_pose[0].tolist()) planner.set_base_pose(pose.raw_pose[0].tolist())
planners.append(planner)
return planner return planners
def update_state(
self,
result: dict,
gripper_state: Literal[-1, 1],
sim_steps_per_control: int,
cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[str],
) -> dict[str, list[Image.Image]]:
n_step = len(result["position"])
total_frames = defaultdict(list)
for i in tqdm(range(n_step), desc="Grasping"):
qpos = result["position"][min(i, n_step - 1)]
action = np.hstack([qpos, gripper_state])[None, ...]
action = torch.from_numpy(action).float()
frames = self.scene.step_action(
action, sim_steps_per_control, cameras, render_keys
)
for camera in cameras:
total_frames[camera.name].extend(frames[camera.name])
return total_frames
def control_gripper( def control_gripper(
self, self,
sim_steps_per_control: int,
cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[str],
gripper_state: Literal[-1, 1], gripper_state: Literal[-1, 1],
n_step: int = 10, n_step: int = 10,
) -> dict[str, list[Image.Image]]: ) -> np.ndarray:
total_frames = defaultdict(list)
qpos = self.robot.get_qpos()[0, :-2].cpu().numpy() qpos = self.robot.get_qpos()[0, :-2].cpu().numpy()
actions = []
for _ in range(n_step): for _ in range(n_step):
action = np.hstack([qpos, gripper_state])[None, ...] action = np.hstack([qpos, gripper_state])[None, ...]
action = torch.from_numpy(action).float() actions.append(action)
frames = self.scene.step_action(
action, sim_steps_per_control, cameras, render_keys
)
for camera in cameras:
total_frames[camera.name].extend(frames[camera.name])
return total_frames return np.concatenate(actions, axis=0)
def move_to_pose( def move_to_pose(
self, self,
pose: sapien.Pose, pose: sapien.Pose,
sim_steps_per_control: int,
control_timestep: float, control_timestep: float,
cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[str],
gripper_state: Literal[-1, 1], gripper_state: Literal[-1, 1],
use_point_cloud: bool = False, use_point_cloud: bool = False,
n_max_step: int = 100, n_max_step: int = 100,
) -> dict[str, list[Image.Image]]: action_key: str = "position",
# First try screw motion planning env_idx: int = 0,
result = self.planner.plan_screw( ) -> np.ndarray:
np.concatenate([pose.p, pose.q]), result = self.planners[env_idx].plan_qpos_to_pose(
self.robot.get_qpos().cpu().numpy()[0],
time_step=control_timestep,
use_point_cloud=use_point_cloud,
)
# Fallback to standard planning if screw fails
if result["status"] != "Success":
result = self.planner.plan_qpos_to_pose(
np.concatenate([pose.p, pose.q]), np.concatenate([pose.p, pose.q]),
self.robot.get_qpos().cpu().numpy()[0], self.robot.get_qpos().cpu().numpy()[0],
time_step=control_timestep, time_step=control_timestep,
@ -602,44 +555,42 @@ class FrankaPandaGrasper(object):
) )
if result["status"] != "Success": if result["status"] != "Success":
return defaultdict(list) result = self.planners[env_idx].plan_screw(
np.concatenate([pose.p, pose.q]),
for key in result: self.robot.get_qpos().cpu().numpy()[0],
if key in ["status", "duration"]: time_step=control_timestep,
continue use_point_cloud=use_point_cloud,
sample_ratio = (len(result[key]) // n_max_step) + 1
result[key] = result[key][::sample_ratio]
frames = self.update_state(
result, gripper_state, sim_steps_per_control, cameras, render_keys
) )
return frames if result["status"] != "Success":
return
def render_grasp( sample_ratio = (len(result[action_key]) // n_max_step) + 1
result[action_key] = result[action_key][::sample_ratio]
n_step = len(result[action_key])
actions = []
for i in range(n_step):
qpos = result[action_key][i]
action = np.hstack([qpos, gripper_state])[None, ...]
actions.append(action)
return np.concatenate(actions, axis=0)
def compute_grasp_action(
self, self,
actor: sapien.pysapien.Entity, actor: sapien.pysapien.Entity,
cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[
Literal[
"Color",
"Foreground",
"Segmentation",
"Normal",
"Mask",
"Depth",
]
],
reach_target_only: bool = True, reach_target_only: bool = True,
) -> dict[str, list[Image.Image]]: offset: tuple[float, float, float] = [0, 0, -0.05],
env_idx: int = 0,
) -> np.ndarray:
physx_rigid = actor.components[1] physx_rigid = actor.components[1]
mesh = get_component_mesh(physx_rigid, to_world_frame=True) mesh = get_component_mesh(physx_rigid, to_world_frame=True)
obb = mesh.bounding_box_oriented obb = mesh.bounding_box_oriented
approaching = np.array([0, 0, -1]) approaching = np.array([0, 0, -1])
tcp_pose = self.agent.tcp.pose[env_idx]
target_closing = ( target_closing = (
self.agent.tcp.pose.to_transformation_matrix()[0, :3, 1] tcp_pose.to_transformation_matrix()[0, :3, 1].cpu().numpy()
.cpu()
.numpy()
) )
grasp_info = compute_grasp_info_by_obb( grasp_info = compute_grasp_info_by_obb(
obb, obb,
@ -647,60 +598,46 @@ class FrankaPandaGrasper(object):
target_closing=target_closing, target_closing=target_closing,
depth=self.finger_length, depth=self.finger_length,
) )
closing, center = grasp_info["closing"], grasp_info["center"]
raw_tcp_pose = self.agent.tcp.pose.sp
grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
reach_pose = grasp_pose * sapien.Pose(p=[0, 0, -0.05])
grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
total_frames = defaultdict(list) closing, center = grasp_info["closing"], grasp_info["center"]
frames_list = [] raw_tcp_pose = tcp_pose.sp
reach_frames = self.move_to_pose( grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
reach_pose = grasp_pose * sapien.Pose(p=offset)
grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
actions = []
reach_actions = self.move_to_pose(
reach_pose, reach_pose,
self.sim_steps_per_control,
self.control_timestep, self.control_timestep,
cameras,
render_keys,
gripper_state=1, gripper_state=1,
env_idx=env_idx,
) )
frames_list.append(reach_frames) actions.append(reach_actions)
if len(reach_frames) == 0:
if reach_actions is None:
logger.warning( logger.warning(
f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping." f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping."
) )
return total_frames return None
if not reach_target_only: if not reach_target_only:
grasp_frames = self.move_to_pose( grasp_actions = self.move_to_pose(
grasp_pose, grasp_pose,
self.sim_steps_per_control,
self.control_timestep, self.control_timestep,
cameras,
render_keys,
gripper_state=1, gripper_state=1,
env_idx=env_idx,
) )
frames_list.append(grasp_frames) actions.append(grasp_actions)
close_frames = self.control_gripper( close_actions = self.control_gripper(
self.sim_steps_per_control,
cameras,
render_keys,
gripper_state=-1, gripper_state=-1,
env_idx=env_idx,
) )
frames_list.append(close_frames) actions.append(close_actions)
back_frames = self.move_to_pose( back_actions = self.move_to_pose(
raw_tcp_pose, raw_tcp_pose,
self.sim_steps_per_control,
self.control_timestep, self.control_timestep,
cameras,
render_keys,
gripper_state=-1, gripper_state=-1,
env_idx=env_idx,
) )
frames_list.append(back_frames) actions.append(back_actions)
for frame_dict in frames_list: return np.concatenate(actions, axis=0)
for cam_name, frames in frame_dict.items():
total_frames[cam_name].extend(frames)
self.agent.reset(self.agent.init_qpos)
return total_frames

View File

@ -1 +1 @@
VERSION = "v0.1.3" VERSION = "v0.1.4"

View File

@ -24,6 +24,7 @@ from xml.dom.minidom import parseString
import numpy as np import numpy as np
import trimesh import trimesh
from embodied_gen.data.convex_decomposer import decompose_convex_mesh
from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient
from embodied_gen.utils.process_media import render_asset3d from embodied_gen.utils.process_media import render_asset3d
from embodied_gen.utils.tags import VERSION from embodied_gen.utils.tags import VERSION
@ -84,6 +85,7 @@ class URDFGenerator(object):
attrs_name: list[str] = None, attrs_name: list[str] = None,
render_dir: str = "urdf_renders", render_dir: str = "urdf_renders",
render_view_num: int = 4, render_view_num: int = 4,
decompose_convex: bool = False,
) -> None: ) -> None:
if mesh_file_list is None: if mesh_file_list is None:
mesh_file_list = [] mesh_file_list = []
@ -133,7 +135,8 @@ class URDFGenerator(object):
For example: For example:
- A pen standing upright in the first image (aligned with the image's vertical axis) - A pen standing upright in the first image (aligned with the image's vertical axis)
full body visible in the first image: vertical height 0.14-0.20 m full body visible in the first image: vertical height 0.14-0.20 m
- A pen lying flat in the first image (showing thickness or as a dot) vertical height 0.018-0.025 m - A pen lying flat in the first image or either the tip or the tail is facing the image
(showing thickness or as a circle) vertical height 0.018-0.025 m
- Tilted pen in the first image (e.g., ~45° angle): vertical height 0.07-0.12 m - Tilted pen in the first image (e.g., ~45° angle): vertical height 0.07-0.12 m
- Use the rest views to help determine the object's 3D pose and orientation. - Use the rest views to help determine the object's 3D pose and orientation.
Assume the object is in real-world scale and estimate the approximate vertical height Assume the object is in real-world scale and estimate the approximate vertical height
@ -156,6 +159,7 @@ class URDFGenerator(object):
"gs_model", "gs_model",
] ]
self.attrs_name = attrs_name self.attrs_name = attrs_name
self.decompose_convex = decompose_convex
def parse_response(self, response: str) -> dict[str, any]: def parse_response(self, response: str) -> dict[str, any]:
lines = response.split("\n") lines = response.split("\n")
@ -258,9 +262,24 @@ class URDFGenerator(object):
# Update collision geometry # Update collision geometry
collision = link.find("collision/geometry/mesh") collision = link.find("collision/geometry/mesh")
if collision is not None: if collision is not None:
collision.set( collision_mesh = os.path.join(self.output_mesh_dir, obj_name)
"filename", os.path.join(self.output_mesh_dir, obj_name) if self.decompose_convex:
try:
d_params = dict(
threshold=0.05, max_convex_hull=64, verbose=False
) )
filename = f"{os.path.splitext(obj_name)[0]}_collision.ply"
output_path = os.path.join(mesh_folder, filename)
decompose_convex_mesh(
mesh_output_path, output_path, **d_params
)
collision_mesh = f"{self.output_mesh_dir}/{filename}"
except Exception as e:
logger.warning(
f"Convex decomposition failed for {output_path}, {e}."
"Use original mesh for collision computation."
)
collision.set("filename", collision_mesh)
collision.set("scale", "1.0 1.0 1.0") collision.set("scale", "1.0 1.0 1.0")
# Update friction coefficients # Update friction coefficients

View File

@ -9,10 +9,10 @@ PIP_INSTALL_PACKAGES=(
"xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118" "xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118"
"flash-attn==2.7.0.post2 --no-build-isolation" "flash-attn==2.7.0.post2 --no-build-isolation"
"-r requirements.txt --use-deprecated=legacy-resolver" "-r requirements.txt --use-deprecated=legacy-resolver"
"utils3d@git+https://github.com/EasternJournalist/utils3d.git#egg=9a4eb15" "utils3d@git+https://github.com/EasternJournalist/utils3d.git@9a4eb15"
"clip@git+https://github.com/openai/CLIP.git" "clip@git+https://github.com/openai/CLIP.git"
"segment-anything@git+https://github.com/facebookresearch/segment-anything.git#egg=dca509f" "segment-anything@git+https://github.com/facebookresearch/segment-anything.git@dca509f"
"nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git#egg=729261d" "nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git@729261d"
"kolors@git+https://github.com/HochCC/Kolors.git" "kolors@git+https://github.com/HochCC/Kolors.git"
"kaolin@git+https://github.com/NVIDIAGameWorks/kaolin.git@v0.16.0" "kaolin@git+https://github.com/NVIDIAGameWorks/kaolin.git@v0.16.0"
"git+https://github.com/nerfstudio-project/gsplat.git@v1.5.3" "git+https://github.com/nerfstudio-project/gsplat.git@v1.5.3"

View File

@ -7,7 +7,7 @@ packages = ["embodied_gen"]
[project] [project]
name = "embodied_gen" name = "embodied_gen"
version = "v0.1.3" version = "v0.1.4"
readme = "README.md" readme = "README.md"
license = "Apache-2.0" license = "Apache-2.0"
license-files = ["LICENSE", "NOTICE"] license-files = ["LICENSE", "NOTICE"]

View File

@ -38,5 +38,6 @@ tyro
pyquaternion pyquaternion
shapely shapely
sapien==3.0.0b1 sapien==3.0.0b1
coacd
mani_skill==3.0.0b21 mani_skill==3.0.0b21
typing_extensions==4.14.1 typing_extensions==4.14.1

View File

@ -39,6 +39,7 @@ def test_urdf_convertor():
"outputs/test_urdf/notebook/result/mesh/notebook.obj", "outputs/test_urdf/notebook/result/mesh/notebook.obj",
"outputs/test_urdf/marker/result/mesh/marker.obj", "outputs/test_urdf/marker/result/mesh/marker.obj",
"outputs/test_urdf/pen2/result/mesh/pen.obj", "outputs/test_urdf/pen2/result/mesh/pen.obj",
"outputs/test_urdf/pen3/result/mesh/pen.obj",
] ]
for idx, mesh_path in enumerate(mesh_paths): for idx, mesh_path in enumerate(mesh_paths):
filename = mesh_path.split("/")[-1].split(".")[0] filename = mesh_path.split("/")[-1].split(".")[0]
@ -49,3 +50,19 @@ def test_urdf_convertor():
# min_height=1.0, # min_height=1.0,
# max_height=1.2, # max_height=1.2,
) )
def test_decompose_convex_mesh():
urdf_gen = URDFGenerator(GPT_CLIENT, decompose_convex=True)
mesh_paths = [
"outputs/test_urdf/sample_0/mesh/pen.obj",
"outputs/test_urdf/sample_1/mesh/notepad.obj",
"outputs/test_urdf/sample_2/mesh/plate.obj",
]
for idx, mesh_path in enumerate(mesh_paths):
filename = mesh_path.split("/")[-1].split(".")[0]
urdf_path = urdf_gen(
mesh_path=mesh_path,
output_root=f"outputs/test_urdf3/sample_{idx}",
category=filename,
)

View File

@ -0,0 +1,44 @@
# Project EmbodiedGen
#
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License.
import logging
import tempfile
from time import time
import pytest
from embodied_gen.data.convex_decomposer import decompose_convex_mesh
logger = logging.getLogger(__name__)
@pytest.mark.parametrize(
"input_mesh_path, max_convex_hull",
[
("apps/assets/example_texture/meshes/robot_text.obj", 8),
# ("apps/assets/example_texture/meshes/robot_text.obj", 32),
# ("apps/assets/example_texture/meshes/robot_text.obj", 64),
# ("apps/assets/example_texture/meshes/robot_text.obj", 128),
],
)
def test_decompose_convex_mesh(input_mesh_path, max_convex_hull):
d_params = dict(
threshold=0.05, max_convex_hull=max_convex_hull, verbose=False
)
with tempfile.NamedTemporaryFile(suffix='.ply', delete=True) as tmp_file:
start_time = time()
decompose_convex_mesh(input_mesh_path, tmp_file.name, **d_params)
logger.info(f"Finished in {time()-start_time:.2f}s")