diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2a39367..36badac 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0).
+## [0.1.4] - 2025-08-29
+### Feature
+- Supports importing the output of the EmbodiedGen Layout Generation module with a single `gym.make` command to create multiple parallel simulation environments and interactive 3D worlds.
+- Supports coacd-based mesh decomposition for better efficiency and accuracy in physical simulations.
+
## [0.1.3] - 2025-08-19
### Feature
- Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description.
diff --git a/README.md b/README.md
index f501f3f..3099ddd 100644
--- a/README.md
+++ b/README.md
@@ -30,7 +30,7 @@
```sh
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
-git checkout v0.1.3
+git checkout v0.1.4
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen
@@ -67,7 +67,8 @@ You can choose between two backends for the GPT agent:
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems.
(HF space is a simplified demonstration. For the full functionality, please refer to `img3d-cli`.)
-
+
+
### ☁️ Service
Run the image-to-3D generation service locally.
@@ -95,7 +96,7 @@ img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/examp
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to `text3d-cli`.)
-
+
### ☁️ Service
Deploy the text-to-3D generation service locally.
@@ -133,7 +134,7 @@ ps: models with more permissive licenses found in `embodied_gen/models/image_com
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh.
-
+
### ☁️ Service
@@ -161,7 +162,7 @@ bash embodied_gen/scripts/texture_gen.sh \
🌍 3D Scene Generation
-
+
### ⚡ API
> Run `bash install.sh extra` to install additional requirements if you need to use `scene3d-cli`.
@@ -184,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
🚧 *Coming Soon*
-
+
---
@@ -196,12 +197,12 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
@@ -219,8 +220,8 @@ layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on
@@ -233,7 +234,7 @@ CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &
```
-Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.)
+Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes.
```sh
python embodied_gen/scripts/compose_layout.py \
--layout_path "outputs/layouts_gens/task_0000/layout.json" \
@@ -243,13 +244,29 @@ python embodied_gen/scripts/compose_layout.py \
We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).
```sh
-sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \
---output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka"
+sim-cli --layout_path "outputs/layouts_gen/task_0000/layout.json" \
+--output_dir "outputs/layouts_gen/task_0000/sapien_render" --insert_robot
+```
+
+Example: generate multiple parallel simulation envs with `gym.make` and record sensor and trajectory data.
+
+
+
+  |
+  |
+
+
+
+```sh
+python embodied_gen/scripts/parallel_sim.py \
+--layout_file "outputs/layouts_gen/task_0000/layout.json" \
+--output_dir "outputs/parallel_sim/task_0000" \
+--num_envs 20
```
### 🖼️ Real-to-Sim Digital Twin
-
+
---
@@ -280,7 +297,7 @@ If you use EmbodiedGen in your research or projects, please cite:
## 🙌 Acknowledgement
EmbodiedGen builds upon the following amazing projects and models:
-🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
+🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 🌟 [ManiSkill](https://github.com/haosulab/ManiSkill)
---
diff --git a/apps/assets/parallel_sim.gif b/apps/assets/parallel_sim.gif
new file mode 100644
index 0000000..dd4aea0
Binary files /dev/null and b/apps/assets/parallel_sim.gif differ
diff --git a/apps/assets/parallel_sim2.gif b/apps/assets/parallel_sim2.gif
new file mode 100644
index 0000000..10665f8
Binary files /dev/null and b/apps/assets/parallel_sim2.gif differ
diff --git a/apps/common.py b/apps/common.py
index e0e71bf..643151b 100644
--- a/apps/common.py
+++ b/apps/common.py
@@ -547,7 +547,9 @@ def extract_urdf(
# Convert to URDF and recover attrs by GPT.
filename = "sample"
- urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4)
+ urdf_convertor = URDFGenerator(
+ GPT_CLIENT, render_view_num=4, decompose_convex=True
+ )
asset_attrs = {
"version": VERSION,
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",
diff --git a/embodied_gen/data/convex_decomposer.py b/embodied_gen/data/convex_decomposer.py
new file mode 100644
index 0000000..4fd2490
--- /dev/null
+++ b/embodied_gen/data/convex_decomposer.py
@@ -0,0 +1,161 @@
+# Project EmbodiedGen
+#
+# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+# implied. See the License for the specific language governing
+# permissions and limitations under the License.
+
+import logging
+import multiprocessing as mp
+import os
+
+import coacd
+import trimesh
+
+logger = logging.getLogger(__name__)
+
+__all__ = [
+ "decompose_convex_coacd",
+ "decompose_convex_mesh",
+ "decompose_convex_process",
+]
+
+
+def decompose_convex_coacd(
+ filename: str, outfile: str, params: dict, verbose: bool = False
+) -> None:
+ coacd.set_log_level("info" if verbose else "warn")
+
+ mesh = trimesh.load(filename, force="mesh")
+ mesh = coacd.Mesh(mesh.vertices, mesh.faces)
+
+ result = coacd.run_coacd(mesh, **params)
+ combined = sum([trimesh.Trimesh(*m) for m in result])
+ combined.export(outfile)
+
+
+def decompose_convex_mesh(
+ filename: str,
+ outfile: str,
+ threshold: float = 0.05,
+ max_convex_hull: int = -1,
+ preprocess_mode: str = "auto",
+ preprocess_resolution: int = 30,
+ resolution: int = 2000,
+ mcts_nodes: int = 20,
+ mcts_iterations: int = 150,
+ mcts_max_depth: int = 3,
+ pca: bool = False,
+ merge: bool = True,
+ seed: int = 0,
+ verbose: bool = False,
+) -> str:
+ """Decompose a mesh into convex parts using the CoACD algorithm."""
+ coacd.set_log_level("info" if verbose else "warn")
+
+ if os.path.exists(outfile):
+ logger.warning(f"Output file {outfile} already exists, removing it.")
+ os.remove(outfile)
+
+ params = dict(
+ threshold=threshold,
+ max_convex_hull=max_convex_hull,
+ preprocess_mode=preprocess_mode,
+ preprocess_resolution=preprocess_resolution,
+ resolution=resolution,
+ mcts_nodes=mcts_nodes,
+ mcts_iterations=mcts_iterations,
+ mcts_max_depth=mcts_max_depth,
+ pca=pca,
+ merge=merge,
+ seed=seed,
+ )
+
+ try:
+ decompose_convex_coacd(filename, outfile, params, verbose)
+ if os.path.exists(outfile):
+ return outfile
+ except Exception as e:
+ if verbose:
+ print(f"Decompose convex first attempt failed: {e}.")
+
+ if preprocess_mode != "on":
+ try:
+ params["preprocess_mode"] = "on"
+ decompose_convex_coacd(filename, outfile, params, verbose)
+ if os.path.exists(outfile):
+ return outfile
+ except Exception as e:
+ if verbose:
+ print(
+ f"Decompose convex second attempt with preprocess_mode='on' failed: {e}"
+ )
+
+ raise RuntimeError(f"Convex decomposition failed on {filename}")
+
+
+def decompose_convex_mp(
+ filename: str,
+ outfile: str,
+ threshold: float = 0.05,
+ max_convex_hull: int = -1,
+ preprocess_mode: str = "auto",
+ preprocess_resolution: int = 30,
+ resolution: int = 2000,
+ mcts_nodes: int = 20,
+ mcts_iterations: int = 150,
+ mcts_max_depth: int = 3,
+ pca: bool = False,
+ merge: bool = True,
+ seed: int = 0,
+ verbose: bool = False,
+) -> str:
+ """Decompose a mesh into convex parts using the CoACD algorithm in a separate process.
+
+ See https://simulately.wiki/docs/toolkits/ConvexDecomp for details.
+ """
+ params = dict(
+ threshold=threshold,
+ max_convex_hull=max_convex_hull,
+ preprocess_mode=preprocess_mode,
+ preprocess_resolution=preprocess_resolution,
+ resolution=resolution,
+ mcts_nodes=mcts_nodes,
+ mcts_iterations=mcts_iterations,
+ mcts_max_depth=mcts_max_depth,
+ pca=pca,
+ merge=merge,
+ seed=seed,
+ )
+
+ ctx = mp.get_context("spawn")
+ p = ctx.Process(
+ target=decompose_convex_coacd,
+ args=(filename, outfile, params, verbose),
+ )
+ p.start()
+ p.join()
+ if p.exitcode == 0 and os.path.exists(outfile):
+ return outfile
+
+ if preprocess_mode != "on":
+ params["preprocess_mode"] = "on"
+ p = ctx.Process(
+ target=decompose_convex_coacd,
+ args=(filename, outfile, params, verbose),
+ )
+ p.start()
+ p.join()
+ if p.exitcode == 0 and os.path.exists(outfile):
+ return outfile
+
+ raise RuntimeError(f"Convex decomposition failed on {filename}")
diff --git a/embodied_gen/data/mesh_operator.py b/embodied_gen/data/mesh_operator.py
index 38f0563..8d41ba0 100644
--- a/embodied_gen/data/mesh_operator.py
+++ b/embodied_gen/data/mesh_operator.py
@@ -16,13 +16,17 @@
import logging
+import multiprocessing as mp
+import os
from typing import Tuple, Union
+import coacd
import igraph
import numpy as np
import pyvista as pv
import spaces
import torch
+import trimesh
import utils3d
from pymeshfix import _meshfix
from tqdm import tqdm
@@ -33,7 +37,9 @@ logging.basicConfig(
logger = logging.getLogger(__name__)
-__all__ = ["MeshFixer"]
+__all__ = [
+ "MeshFixer",
+]
def _radical_inverse(base, n):
diff --git a/embodied_gen/envs/pick_embodiedgen.py b/embodied_gen/envs/pick_embodiedgen.py
new file mode 100644
index 0000000..9b14070
--- /dev/null
+++ b/embodied_gen/envs/pick_embodiedgen.py
@@ -0,0 +1,389 @@
+# Project EmbodiedGen
+#
+# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+# implied. See the License for the specific language governing
+# permissions and limitations under the License.
+
+import json
+import os
+from copy import deepcopy
+
+import numpy as np
+import sapien
+import torch
+import torchvision.transforms as transforms
+from mani_skill.envs.sapien_env import BaseEnv
+from mani_skill.sensors.camera import CameraConfig
+from mani_skill.utils import sapien_utils
+from mani_skill.utils.building import actors
+from mani_skill.utils.registration import register_env
+from mani_skill.utils.structs.actor import Actor
+from mani_skill.utils.structs.pose import Pose
+from mani_skill.utils.structs.types import (
+ GPUMemoryConfig,
+ SceneConfig,
+ SimConfig,
+)
+from mani_skill.utils.visualization.misc import tile_images
+from tqdm import tqdm
+from embodied_gen.models.gs_model import GaussianOperator
+from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
+from embodied_gen.utils.geometry import bfs_placement, quaternion_multiply
+from embodied_gen.utils.log import logger
+from embodied_gen.utils.process_media import alpha_blend_rgba
+from embodied_gen.utils.simulation import (
+ SIM_COORD_ALIGN,
+ load_assets_from_layout_file,
+)
+
+__all__ = ["PickEmbodiedGen"]
+
+
+@register_env("PickEmbodiedGen-v1", max_episode_steps=100)
+class PickEmbodiedGen(BaseEnv):
+ SUPPORTED_ROBOTS = ["panda", "panda_wristcam", "fetch"]
+ goal_thresh = 0.0
+
+ def __init__(
+ self,
+ *args,
+ robot_uids: str | list[str] = "panda",
+ robot_init_qpos_noise: float = 0.02,
+ num_envs: int = 1,
+ reconfiguration_freq: int = None,
+ **kwargs,
+ ):
+ self.robot_init_qpos_noise = robot_init_qpos_noise
+ if reconfiguration_freq is None:
+ if num_envs == 1:
+ reconfiguration_freq = 1
+ else:
+ reconfiguration_freq = 0
+
+ # Init params from kwargs.
+ layout_file = kwargs.pop("layout_file", None)
+ replace_objs = kwargs.pop("replace_objs", True)
+ self.enable_grasp = kwargs.pop("enable_grasp", False)
+ self.init_quat = kwargs.pop("init_quat", [0.7071, 0, 0, 0.7071])
+ # Add small offset in z-axis to avoid collision.
+ self.objs_z_offset = kwargs.pop("objs_z_offset", 0.002)
+ self.robot_z_offset = kwargs.pop("robot_z_offset", 0.002)
+
+ self.layouts = self.init_env_layouts(
+ layout_file, num_envs, replace_objs
+ )
+ self.robot_pose = self.compute_robot_init_pose(
+ self.layouts, num_envs, self.robot_z_offset
+ )
+ self.env_actors = dict()
+ self.image_transform = transforms.PILToTensor()
+
+ super().__init__(
+ *args,
+ robot_uids=robot_uids,
+ reconfiguration_freq=reconfiguration_freq,
+ num_envs=num_envs,
+ **kwargs,
+ )
+
+ self.bg_images = dict()
+ if self.render_mode == "hybrid":
+ self.bg_images = self.render_gs3d_images(
+ self.layouts, num_envs, self.init_quat
+ )
+
+ @staticmethod
+ def init_env_layouts(
+ layout_file: str, num_envs: int, replace_objs: bool
+ ) -> list[LayoutInfo]:
+ layout = LayoutInfo.from_dict(json.load(open(layout_file, "r")))
+ layouts = []
+ for env_idx in range(num_envs):
+ if replace_objs and env_idx > 0:
+ layout = bfs_placement(deepcopy(layout))
+ layouts.append(layout)
+
+ return layouts
+
+ @staticmethod
+ def compute_robot_init_pose(
+ layouts: list[LayoutInfo], num_envs: int, z_offset: float = 0.0
+ ) -> list[list[float]]:
+ robot_pose = []
+ for env_idx in range(num_envs):
+ layout = layouts[env_idx]
+ robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
+ x, y, z, qx, qy, qz, qw = layout.position[robot_node]
+ robot_pose.append([x, y, z + z_offset, qw, qx, qy, qz])
+
+ return robot_pose
+
+ @property
+ def _default_sim_config(self):
+ return SimConfig(
+ scene_config=SceneConfig(
+ solver_position_iterations=30,
+ # contact_offset=0.04,
+ # rest_offset=0.001,
+ ),
+ # sim_freq=200,
+ control_freq=50,
+ gpu_memory_config=GPUMemoryConfig(
+ max_rigid_contact_count=2**20, max_rigid_patch_count=2**19
+ ),
+ )
+
+ @property
+ def _default_sensor_configs(self):
+ pose = sapien_utils.look_at(eye=[0.3, 0, 0.6], target=[-0.1, 0, 0.1])
+
+ return [
+ CameraConfig("base_camera", pose, 128, 128, np.pi / 2, 0.01, 100)
+ ]
+
+ @property
+ def _default_human_render_camera_configs(self):
+ pose = sapien_utils.look_at(
+ eye=[0.9, 0.0, 1.1], target=[0.0, 0.0, 0.9]
+ )
+
+ return CameraConfig(
+ "render_camera", pose, 256, 256, np.deg2rad(75), 0.01, 100
+ )
+
+ def _load_agent(self, options: dict):
+ super()._load_agent(options, sapien.Pose(p=[-10, 0, 10]))
+
+ def _load_scene(self, options: dict):
+ all_objects = []
+ logger.info(f"Loading assets and decomposition mesh collisions...")
+ for env_idx in range(self.num_envs):
+ env_actors = load_assets_from_layout_file(
+ self.scene,
+ self.layouts[env_idx],
+ z_offset=self.objs_z_offset,
+ init_quat=self.init_quat,
+ env_idx=env_idx,
+ )
+ self.env_actors[f"env{env_idx}"] = env_actors
+ all_objects.extend(env_actors.values())
+
+ self.obj = all_objects[-1]
+ for obj in all_objects:
+ self.remove_from_state_dict_registry(obj)
+
+ self.all_objects = Actor.merge(all_objects, name="all_objects")
+ self.add_to_state_dict_registry(self.all_objects)
+
+ self.goal_site = actors.build_sphere(
+ self.scene,
+ radius=self.goal_thresh,
+ color=[0, 1, 0, 0],
+ name="goal_site",
+ body_type="kinematic",
+ add_collision=False,
+ initial_pose=sapien.Pose(),
+ )
+ self._hidden_objects.append(self.goal_site)
+
+ def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
+ with torch.device(self.device):
+ b = len(env_idx)
+ goal_xyz = torch.zeros((b, 3))
+ goal_xyz[:, :2] = torch.rand((b, 2)) * 0.2 - 0.1
+ self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))
+
+ qpos = np.array(
+ [
+ 0.0,
+ np.pi / 8,
+ 0,
+ -np.pi * 3 / 8,
+ 0,
+ np.pi * 3 / 4,
+ np.pi / 4,
+ 0.04,
+ 0.04,
+ ]
+ )
+ qpos = (
+ np.random.normal(
+ 0, self.robot_init_qpos_noise, (self.num_envs, len(qpos))
+ )
+ + qpos
+ )
+ qpos[:, -2:] = 0.04
+ self.agent.robot.set_root_pose(np.array(self.robot_pose))
+ self.agent.reset(qpos)
+ self.agent.init_qpos = qpos
+ self.agent.controller.controllers["gripper"].reset()
+
+ def render_gs3d_images(
+ self, layouts: list[LayoutInfo], num_envs: int, init_quat: list[float]
+ ) -> dict[str, np.ndarray]:
+ sim_coord_align = (
+ torch.tensor(SIM_COORD_ALIGN).to(torch.float32).to(self.device)
+ )
+ cameras = self.scene.sensors.copy()
+ cameras.update(self.scene.human_render_cameras)
+
+ bg_node = layouts[0].relation[Scene3DItemEnum.BACKGROUND.value]
+ gs_path = os.path.join(layouts[0].assets[bg_node], "gs_model.ply")
+ raw_gs: GaussianOperator = GaussianOperator.load_from_ply(gs_path)
+ bg_images = dict()
+ for env_idx in tqdm(range(num_envs), desc="Pre-rendering Background"):
+ layout = layouts[env_idx]
+ x, y, z, qx, qy, qz, qw = layout.position[bg_node]
+ qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
+ init_pose = torch.tensor([x, y, z, qx, qy, qz, qw])
+ gs_model = raw_gs.get_gaussians(instance_pose=init_pose)
+ for key in cameras:
+ camera = cameras[key]
+ Ks = camera.camera.get_intrinsic_matrix() # (n_env, 3, 3)
+ c2w = camera.camera.get_model_matrix() # (n_env, 4, 4)
+ result = gs_model.render(
+ c2w[env_idx] @ sim_coord_align,
+ Ks[env_idx],
+ image_width=camera.config.width,
+ image_height=camera.config.height,
+ )
+ bg_images[f"{key}-env{env_idx}"] = result.rgb[..., ::-1]
+
+ return bg_images
+
+ def render(self):
+ if self.render_mode is None:
+ raise RuntimeError("render_mode is not set.")
+ if self.render_mode == "human":
+ return self.render_human()
+ elif self.render_mode == "rgb_array":
+ res = self.render_rgb_array()
+ return res
+ elif self.render_mode == "sensors":
+ res = self.render_sensors()
+ return res
+ elif self.render_mode == "all":
+ return self.render_all()
+ elif self.render_mode == "hybrid":
+ return self.hybrid_render()
+ else:
+ raise NotImplementedError(
+ f"Unsupported render mode {self.render_mode}."
+ )
+
+ def render_rgb_array(
+ self, camera_name: str = None, return_alpha: bool = False
+ ):
+ for obj in self._hidden_objects:
+ obj.show_visual()
+ self.scene.update_render(
+ update_sensors=False, update_human_render_cameras=True
+ )
+ images = []
+ render_images = self.scene.get_human_render_camera_images(
+ camera_name, return_alpha
+ )
+ for image in render_images.values():
+ images.append(image)
+ if len(images) == 0:
+ return None
+ if len(images) == 1:
+ return images[0]
+ for obj in self._hidden_objects:
+ obj.hide_visual()
+ return tile_images(images)
+
+ def render_sensors(self):
+ images = []
+ sensor_images = self.get_sensor_images()
+ for image in sensor_images.values():
+ for img in image.values():
+ images.append(img)
+ return tile_images(images)
+
+ def hybrid_render(self):
+ fg_images = self.render_rgb_array(
+ return_alpha=True
+ ) # (n_env, h, w, 3)
+ images = []
+ for key in self.bg_images:
+ if "render_camera" not in key:
+ continue
+ env_idx = int(key.split("-env")[-1])
+ rgba = alpha_blend_rgba(
+ fg_images[env_idx].cpu().numpy(), self.bg_images[key]
+ )
+ images.append(self.image_transform(rgba))
+
+ images = torch.stack(images, dim=0)
+ images = images.permute(0, 2, 3, 1)
+
+ return images[..., :3]
+
+ def evaluate(self):
+ obj_to_goal_pos = (
+ self.obj.pose.p
+ ) # self.goal_site.pose.p - self.obj.pose.p
+ is_obj_placed = (
+ torch.linalg.norm(obj_to_goal_pos, axis=1) <= self.goal_thresh
+ )
+ is_grasped = self.agent.is_grasping(self.obj)
+ is_robot_static = self.agent.is_static(0.2)
+
+ return dict(
+ is_grasped=is_grasped,
+ obj_to_goal_pos=obj_to_goal_pos,
+ is_obj_placed=is_obj_placed,
+ is_robot_static=is_robot_static,
+ is_grasping=self.agent.is_grasping(self.obj),
+ success=torch.logical_and(is_obj_placed, is_robot_static),
+ )
+
+ def _get_obs_extra(self, info: dict):
+
+ return dict()
+
+ def compute_dense_reward(self, obs: any, action: torch.Tensor, info: dict):
+ tcp_to_obj_dist = torch.linalg.norm(
+ self.obj.pose.p - self.agent.tcp.pose.p, axis=1
+ )
+ reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
+ reward = reaching_reward
+
+ is_grasped = info["is_grasped"]
+ reward += is_grasped
+
+ # obj_to_goal_dist = torch.linalg.norm(
+ # self.goal_site.pose.p - self.obj.pose.p, axis=1
+ # )
+ obj_to_goal_dist = torch.linalg.norm(
+ self.obj.pose.p - self.obj.pose.p, axis=1
+ )
+ place_reward = 1 - torch.tanh(5 * obj_to_goal_dist)
+ reward += place_reward * is_grasped
+
+ reward += info["is_obj_placed"] * is_grasped
+
+ static_reward = 1 - torch.tanh(
+ 5
+ * torch.linalg.norm(self.agent.robot.get_qvel()[..., :-2], axis=1)
+ )
+ reward += static_reward * info["is_obj_placed"] * is_grasped
+
+ reward[info["success"]] = 6
+ return reward
+
+ def compute_normalized_dense_reward(
+ self, obs: any, action: torch.Tensor, info: dict
+ ):
+ return self.compute_dense_reward(obs=obs, action=action, info=info) / 6
diff --git a/embodied_gen/scripts/gen_layout.py b/embodied_gen/scripts/gen_layout.py
index e4255f1..5df6973 100644
--- a/embodied_gen/scripts/gen_layout.py
+++ b/embodied_gen/scripts/gen_layout.py
@@ -137,7 +137,7 @@ def entrypoint() -> None:
sim_cli(
layout_path=layout_path,
output_dir=output_root,
- robot_name="franka" if args.insert_robot else None,
+ insert_robot=args.insert_robot,
)
torch.cuda.empty_cache()
diff --git a/embodied_gen/scripts/imageto3d.py b/embodied_gen/scripts/imageto3d.py
index bf86534..14d3202 100644
--- a/embodied_gen/scripts/imageto3d.py
+++ b/embodied_gen/scripts/imageto3d.py
@@ -58,7 +58,7 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "false"
os.environ["SPCONV_ALGO"] = "native"
random.seed(0)
-logger.info("Loading Models...")
+logger.info("Loading Image3D Models...")
DELIGHT = DelightingModel()
IMAGESR_MODEL = ImageRealESRGAN(outscale=4)
RBG_REMOVER = RembgRemover()
@@ -107,6 +107,7 @@ def parse_args():
type=int,
default=2,
)
+ parser.add_argument("--disable_decompose_convex", action="store_true")
args, unknown = parser.parse_known_args()
return args
@@ -251,7 +252,11 @@ def entrypoint(**kwargs):
mesh_glb_path = os.path.join(output_root, f"{filename}.glb")
mesh.export(mesh_glb_path)
- urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4)
+ urdf_convertor = URDFGenerator(
+ GPT_CLIENT,
+ render_view_num=4,
+ decompose_convex=not args.disable_decompose_convex,
+ )
asset_attrs = {
"version": VERSION,
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",
diff --git a/embodied_gen/scripts/parallel_sim.py b/embodied_gen/scripts/parallel_sim.py
new file mode 100644
index 0000000..91d3fed
--- /dev/null
+++ b/embodied_gen/scripts/parallel_sim.py
@@ -0,0 +1,148 @@
+# Project EmbodiedGen
+#
+# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+# implied. See the License for the specific language governing
+# permissions and limitations under the License.
+
+
+from embodied_gen.utils.monkey_patches import monkey_patch_maniskill
+
+monkey_patch_maniskill()
+import json
+from collections import defaultdict
+from dataclasses import dataclass
+from typing import Literal
+
+import gymnasium as gym
+import numpy as np
+import torch
+import tyro
+from mani_skill.utils.wrappers import RecordEpisode
+from tqdm import tqdm
+import embodied_gen.envs.pick_embodiedgen
+from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
+from embodied_gen.utils.log import logger
+from embodied_gen.utils.simulation import FrankaPandaGrasper
+
+
+@dataclass
+class ParallelSimConfig:
+ """CLI parameters for Parallel Sapien simulation."""
+
+ # Environment configuration
+ layout_file: str
+ """Path to the layout JSON file"""
+ output_dir: str
+ """Directory to save recorded videos"""
+ gym_env_name: str = "PickEmbodiedGen-v1"
+ """Name of the Gym environment to use"""
+ num_envs: int = 4
+ """Number of parallel environments"""
+ render_mode: Literal["rgb_array", "hybrid"] = "hybrid"
+ """Rendering mode: rgb_array or hybrid"""
+ enable_shadow: bool = True
+ """Whether to enable shadows in rendering"""
+ control_mode: str = "pd_joint_pos"
+ """Control mode for the agent"""
+
+ # Recording configuration
+ max_steps_per_video: int = 1000
+ """Maximum steps to record per video"""
+ save_trajectory: bool = False
+ """Whether to save trajectory data"""
+
+ # Simulation parameters
+ seed: int = 0
+ """Random seed for environment reset"""
+ warmup_steps: int = 50
+ """Number of warmup steps before action computation"""
+ reach_target_only: bool = True
+ """Whether to only reach target without full action"""
+
+
+def entrypoint(**kwargs):
+ if kwargs is None or len(kwargs) == 0:
+ cfg = tyro.cli(ParallelSimConfig)
+ else:
+ cfg = ParallelSimConfig(**kwargs)
+
+ env = gym.make(
+ cfg.gym_env_name,
+ num_envs=cfg.num_envs,
+ render_mode=cfg.render_mode,
+ enable_shadow=cfg.enable_shadow,
+ layout_file=cfg.layout_file,
+ control_mode=cfg.control_mode,
+ )
+ env = RecordEpisode(
+ env,
+ cfg.output_dir,
+ max_steps_per_video=cfg.max_steps_per_video,
+ save_trajectory=cfg.save_trajectory,
+ )
+ env.reset(seed=cfg.seed)
+
+ default_action = env.unwrapped.agent.init_qpos[:, :8]
+ for _ in tqdm(range(cfg.warmup_steps), desc="SIM Warmup"):
+ # action = env.action_space.sample() # Random action
+ obs, reward, terminated, truncated, info = env.step(default_action)
+
+ grasper = FrankaPandaGrasper(
+ env.unwrapped.agent,
+ env.unwrapped.sim_config.control_freq,
+ )
+
+ layout_data = LayoutInfo.from_dict(json.load(open(cfg.layout_file, "r")))
+ actions = defaultdict(list)
+ # Plan Grasp reach pose for each manipulated object in each env.
+ for env_idx in range(env.num_envs):
+ actors = env.unwrapped.env_actors[f"env{env_idx}"]
+ for node in layout_data.relation[
+ Scene3DItemEnum.MANIPULATED_OBJS.value
+ ]:
+ action = grasper.compute_grasp_action(
+ actor=actors[node]._objs[0],
+ reach_target_only=True,
+ env_idx=env_idx,
+ )
+ actions[node].append(action)
+
+ # Excute the planned actions for each manipulated object in each env.
+ for node in actions:
+ max_env_steps = 0
+ for env_idx in range(env.num_envs):
+ if actions[node][env_idx] is None:
+ continue
+ max_env_steps = max(max_env_steps, len(actions[node][env_idx]))
+
+ action_tensor = np.ones(
+ (max_env_steps, env.num_envs, env.action_space.shape[-1])
+ )
+ action_tensor *= default_action[None, ...]
+ for env_idx in range(env.num_envs):
+ action = actions[node][env_idx]
+ if action is None:
+ continue
+ action_tensor[: len(action), env_idx, :] = action
+
+ for step in tqdm(range(max_env_steps), desc=f"Grasping: {node}"):
+ action = torch.Tensor(action_tensor[step]).to(env.unwrapped.device)
+ env.unwrapped.agent.set_action(action)
+ obs, reward, terminated, truncated, info = env.step(action)
+
+ env.close()
+ logger.info(f"Results saved in {cfg.output_dir}")
+
+
+if __name__ == "__main__":
+ entrypoint()
diff --git a/embodied_gen/scripts/simulate_sapien.py b/embodied_gen/scripts/simulate_sapien.py
index 529b765..faa6fbf 100644
--- a/embodied_gen/scripts/simulate_sapien.py
+++ b/embodied_gen/scripts/simulate_sapien.py
@@ -35,6 +35,9 @@ from embodied_gen.utils.simulation import (
SIM_COORD_ALIGN,
FrankaPandaGrasper,
SapienSceneManager,
+ load_assets_from_layout_file,
+ load_mani_skill_robot,
+ render_images,
)
@@ -50,8 +53,8 @@ class SapienSimConfig:
default_factory=lambda: [0.7071, 0, 0, 0.7071]
) # xyzw
device: str = "cuda"
- robot_name: str | None = None
- control_freq: int = 40
+ control_freq: int = 50
+ insert_robot: bool = False
# Camera settings.
render_interval: int = 10
num_cameras: int = 3
@@ -91,48 +94,41 @@ def entrypoint(**kwargs):
layout_data = json.load(f)
layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data)
- scene_manager.load_assets_from_layout_file(
+ actors = load_assets_from_layout_file(
+ scene_manager.scene,
layout_data,
cfg.z_offset,
cfg.init_quat,
- cfg.robot_name,
- cfg.control_freq,
+ )
+ agent = load_mani_skill_robot(
+ scene_manager.scene, layout_data, cfg.control_freq
)
frames = defaultdict(list)
image_cnt = 0
for step in tqdm(range(cfg.sim_step), desc="Simulation"):
scene_manager.scene.step()
- if cfg.robot_name is not None:
- scene_manager.robot.reset(scene_manager.robot.init_qpos)
+ agent.reset(agent.init_qpos)
if step % cfg.render_interval != 0:
continue
scene_manager.scene.update_render()
image_cnt += 1
for camera in scene_manager.cameras:
camera.take_picture()
- images = scene_manager.render_images(
- camera, render_keys=cfg.render_keys
- )
+ images = render_images(camera, cfg.render_keys)
frames[camera.name].append(images)
- grasp_frames = defaultdict(dict)
- if cfg.robot_name is not None:
- sim_steps_per_control = cfg.sim_freq // cfg.control_freq
- control_timestep = 1.0 / cfg.control_freq
+ actions = dict()
+ if cfg.insert_robot:
grasper = FrankaPandaGrasper(
- scene_manager.robot,
- scene_manager,
- sim_steps_per_control,
- control_timestep,
+ agent,
+ cfg.control_freq,
)
for node in layout_data.relation[
Scene3DItemEnum.MANIPULATED_OBJS.value
]:
- grasp_frames[node] = grasper.render_grasp(
- scene_manager.actors[node],
- scene_manager.cameras,
- cfg.render_keys,
+ actions[node] = grasper.compute_grasp_action(
+ actor=actors[node], reach_target_only=True
)
if "Foreground" not in cfg.render_keys:
@@ -160,22 +156,36 @@ def entrypoint(**kwargs):
bg_images[camera.name] = result.rgb[..., ::-1]
video_frames = []
- for camera in scene_manager.cameras:
+ for idx, camera in enumerate(scene_manager.cameras):
# Scene rendering
- for step in tqdm(range(image_cnt), desc="Rendering"):
- rgba = alpha_blend_rgba(
- frames[camera.name][step]["Foreground"],
- bg_images[camera.name],
- )
- video_frames.append(np.array(rgba))
- # Grasp rendering
- for node in grasp_frames:
- for frame in grasp_frames[node][camera.name]:
+ if idx == 0:
+ for step in range(image_cnt):
rgba = alpha_blend_rgba(
- frame["Foreground"], bg_images[camera.name]
+ frames[camera.name][step]["Foreground"],
+ bg_images[camera.name],
)
video_frames.append(np.array(rgba))
+ # Grasp rendering
+ for node in actions:
+ if actions[node] is None:
+ continue
+ for action in tqdm(actions[node]):
+ grasp_frames = scene_manager.step_action(
+ agent,
+ torch.Tensor(action[None, ...]),
+ scene_manager.cameras,
+ cfg.render_keys,
+ sim_steps_per_control=cfg.sim_freq // cfg.control_freq,
+ )
+ rgba = alpha_blend_rgba(
+ grasp_frames[camera.name][0]["Foreground"],
+ bg_images[camera.name],
+ )
+ video_frames.append(np.array(rgba))
+
+ agent.reset(agent.init_qpos)
+
os.makedirs(cfg.output_dir, exist_ok=True)
video_path = f"{cfg.output_dir}/Iscene.mp4"
imageio.mimsave(video_path, video_frames, fps=30)
diff --git a/embodied_gen/scripts/textto3d.py b/embodied_gen/scripts/textto3d.py
index 03a9d2f..64475e3 100644
--- a/embodied_gen/scripts/textto3d.py
+++ b/embodied_gen/scripts/textto3d.py
@@ -42,7 +42,7 @@ from embodied_gen.validators.quality_checkers import (
os.environ["TOKENIZERS_PARALLELISM"] = "false"
random.seed(0)
-logger.info("Loading Models...")
+logger.info("Loading TEXT2IMG_MODEL...")
SEMANTIC_CHECKER = SemanticConsistChecker(GPT_CLIENT)
SEG_CHECKER = ImageSegChecker(GPT_CLIENT)
TXTGEN_CHECKER = TextGenAlignChecker(GPT_CLIENT)
@@ -170,6 +170,7 @@ def text_to_3d(**kwargs) -> dict:
seed=random.randint(0, 100000) if seed_3d is None else seed_3d,
n_retry=args.n_asset_retry,
keep_intermediate=args.keep_intermediate,
+ disable_decompose_convex=args.disable_decompose_convex,
)
mesh_path = f"{node_save_dir}/result/mesh/{save_node}.obj"
image_path = render_asset3d(
@@ -270,6 +271,7 @@ def parse_args():
help="Random seed for 3D generation",
)
parser.add_argument("--keep_intermediate", action="store_true")
+ parser.add_argument("--disable_decompose_convex", action="store_true")
args, unknown = parser.parse_known_args()
diff --git a/embodied_gen/utils/monkey_patches.py b/embodied_gen/utils/monkey_patches.py
index 79e77d5..81076cd 100644
--- a/embodied_gen/utils/monkey_patches.py
+++ b/embodied_gen/utils/monkey_patches.py
@@ -18,6 +18,7 @@ import os
import sys
import zipfile
+import numpy as np
import torch
from huggingface_hub import hf_hub_download
from omegaconf import OmegaConf
@@ -150,3 +151,68 @@ def monkey_patch_pano2room():
self.inpaint_pipe = pipe
SDFTInpainter.__init__ = patched_sd_inpaint_init
+
+
+def monkey_patch_maniskill():
+ from mani_skill.envs.scene import ManiSkillScene
+
+ def get_sensor_images(
+ self, obs: dict[str, any]
+ ) -> dict[str, dict[str, torch.Tensor]]:
+ sensor_data = dict()
+ for name, sensor in self.sensors.items():
+ sensor_data[name] = sensor.get_images(obs[name])
+ return sensor_data
+
+ def get_human_render_camera_images(
+ self, camera_name: str = None, return_alpha: bool = False
+ ) -> dict[str, torch.Tensor]:
+ def get_rgba_tensor(camera, return_alpha):
+ color = camera.get_obs(
+ rgb=True, depth=False, segmentation=False, position=False
+ )["rgb"]
+ if return_alpha:
+ seg_labels = camera.get_obs(
+ rgb=False, depth=False, segmentation=True, position=False
+ )["segmentation"]
+ masks = np.where((seg_labels.cpu() > 0), 255, 0).astype(
+ np.uint8
+ )
+ masks = torch.tensor(masks).to(color.device)
+ color = torch.concat([color, masks], dim=-1)
+
+ return color
+
+ image_data = dict()
+ if self.gpu_sim_enabled:
+ if self.parallel_in_single_scene:
+ for name, camera in self.human_render_cameras.items():
+ camera.camera._render_cameras[0].take_picture()
+ rgba = get_rgba_tensor(camera, return_alpha)
+ image_data[name] = rgba
+ else:
+ for name, camera in self.human_render_cameras.items():
+ if camera_name is not None and name != camera_name:
+ continue
+ assert camera.config.shader_config.shader_pack not in [
+ "rt",
+ "rt-fast",
+ "rt-med",
+ ], "ray tracing shaders do not work with parallel rendering"
+ camera.capture()
+ rgba = get_rgba_tensor(camera, return_alpha)
+ image_data[name] = rgba
+ else:
+ for name, camera in self.human_render_cameras.items():
+ if camera_name is not None and name != camera_name:
+ continue
+ camera.capture()
+ rgba = get_rgba_tensor(camera, return_alpha)
+ image_data[name] = rgba
+
+ return image_data
+
+ ManiSkillScene.get_sensor_images = get_sensor_images
+ ManiSkillScene.get_human_render_camera_images = (
+ get_human_render_camera_images
+ )
diff --git a/embodied_gen/utils/simulation.py b/embodied_gen/utils/simulation.py
index b858a3c..36cf4c8 100644
--- a/embodied_gen/utils/simulation.py
+++ b/embodied_gen/utils/simulation.py
@@ -14,7 +14,7 @@
# implied. See the License for the specific language governing
# permissions and limitations under the License.
-
+import json
import logging
import os
import xml.etree.ElementTree as ET
@@ -24,19 +24,20 @@ from typing import Literal
import mplib
import numpy as np
import sapien.core as sapien
+import sapien.physx as physx
import torch
-from PIL import Image, ImageColor
-from scipy.spatial.transform import Rotation as R
-from tqdm import tqdm
-from embodied_gen.data.utils import DiffrastRender
-from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
-from embodied_gen.utils.geometry import quaternion_multiply
-
-logger = logging.getLogger(__name__)
+from mani_skill.agents.base_agent import BaseAgent
+from mani_skill.envs.scene import ManiSkillScene
from mani_skill.examples.motionplanning.panda.utils import (
compute_grasp_info_by_obb,
)
from mani_skill.utils.geometry.trimesh_utils import get_component_mesh
+from PIL import Image, ImageColor
+from scipy.spatial.transform import Rotation as R
+from embodied_gen.data.utils import DiffrastRender
+from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
+from embodied_gen.utils.geometry import quaternion_multiply
+from embodied_gen.utils.log import logger
COLORMAP = list(set(ImageColor.colormap.values()))
COLOR_PALETTE = np.array(
@@ -54,10 +55,267 @@ SIM_COORD_ALIGN = np.array(
__all__ = [
"SIM_COORD_ALIGN",
"FrankaPandaGrasper",
- "SimpleGrasper",
+ "load_assets_from_layout_file",
+ "load_mani_skill_robot",
+ "render_images",
]
+def load_actor_from_urdf(
+ scene: ManiSkillScene | sapien.Scene,
+ file_path: str,
+ pose: sapien.Pose,
+ env_idx: int = None,
+ use_static: bool = False,
+ update_mass: bool = False,
+) -> sapien.pysapien.Entity:
+ tree = ET.parse(file_path)
+ root = tree.getroot()
+ node_name = root.get("name")
+ file_dir = os.path.dirname(file_path)
+
+ visual_mesh = root.find('.//visual/geometry/mesh')
+ visual_file = visual_mesh.get("filename")
+ visual_scale = visual_mesh.get("scale", "1.0 1.0 1.0")
+ visual_scale = np.array([float(x) for x in visual_scale.split()])
+
+ collision_mesh = root.find('.//collision/geometry/mesh')
+ collision_file = collision_mesh.get("filename")
+ collision_scale = collision_mesh.get("scale", "1.0 1.0 1.0")
+ collision_scale = np.array([float(x) for x in collision_scale.split()])
+
+ visual_file = os.path.join(file_dir, visual_file)
+ collision_file = os.path.join(file_dir, collision_file)
+ static_fric = root.find('.//collision/gazebo/mu1').text
+ dynamic_fric = root.find('.//collision/gazebo/mu2').text
+
+ material = physx.PhysxMaterial(
+ static_friction=np.clip(float(static_fric), 0.1, 0.7),
+ dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
+ restitution=0.05,
+ )
+ builder = scene.create_actor_builder()
+
+ body_type = "static" if use_static else "dynamic"
+ builder.set_physx_body_type(body_type)
+ builder.add_multiple_convex_collisions_from_file(
+ collision_file if body_type == "dynamic" else visual_file,
+ material=material,
+ scale=collision_scale if body_type == "dynamic" else visual_scale,
+ # decomposition="coacd",
+ # decomposition_params=dict(
+ # threshold=0.05, max_convex_hull=64, verbose=False
+ # ),
+ )
+
+ builder.add_visual_from_file(visual_file, scale=visual_scale)
+ builder.set_initial_pose(pose)
+ if isinstance(scene, ManiSkillScene) and env_idx is not None:
+ builder.set_scene_idxs([env_idx])
+
+ actor = builder.build(name=f"{node_name}-{env_idx}")
+
+ if update_mass and hasattr(actor.components[1], "mass"):
+ node_mass = float(root.find('.//inertial/mass').get("value"))
+ actor.components[1].set_mass(node_mass)
+
+ return actor
+
+
+def load_assets_from_layout_file(
+ scene: ManiSkillScene | sapien.Scene,
+ layout: LayoutInfo | str,
+ z_offset: float = 0.0,
+ init_quat: list[float] = [0, 0, 0, 1],
+ env_idx: int = None,
+) -> dict[str, sapien.pysapien.Entity]:
+ """Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
+
+ Args:
+ scene (sapien.Scene | ManiSkillScene): The SAPIEN or ManiSkill scene to load assets into.
+ layout (LayoutInfo): The layout information data.
+ z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
+ init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
+ env_idx (int): Environment index for multi-environment setup.
+ """
+ if isinstance(layout, str) and layout.endswith(".json"):
+ layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
+
+ actors = dict()
+ for node in layout.assets:
+ file_dir = layout.assets[node]
+ file_name = f"{node.replace(' ', '_')}.urdf"
+ urdf_file = os.path.join(file_dir, file_name)
+
+ if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
+ continue
+
+ position = layout.position[node].copy()
+ if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
+ position[2] += z_offset
+
+ use_static = (
+ layout.relation.get(Scene3DItemEnum.CONTEXT.value, None) == node
+ )
+
+ # Combine initial quaternion with object quaternion
+ x, y, z, qx, qy, qz, qw = position
+ qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
+ actor = load_actor_from_urdf(
+ scene,
+ urdf_file,
+ sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
+ env_idx,
+ use_static=use_static,
+ update_mass=False,
+ )
+ actors[node] = actor
+
+ return actors
+
+
+def load_mani_skill_robot(
+ scene: sapien.Scene | ManiSkillScene,
+ layout: LayoutInfo | str,
+ control_freq: int = 20,
+ robot_init_qpos_noise: float = 0.0,
+ control_mode: str = "pd_joint_pos",
+ backend_str: tuple[str, str] = ("cpu", "gpu"),
+) -> BaseAgent:
+ from mani_skill.agents import REGISTERED_AGENTS
+ from mani_skill.envs.scene import ManiSkillScene
+ from mani_skill.envs.utils.system.backend import (
+ parse_sim_and_render_backend,
+ )
+
+ if isinstance(layout, str) and layout.endswith(".json"):
+ layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
+
+ robot_name = layout.relation[Scene3DItemEnum.ROBOT.value]
+ x, y, z, qx, qy, qz, qw = layout.position[robot_name]
+ delta_z = 0.002 # Add small offset to avoid collision.
+ pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
+
+ if robot_name not in REGISTERED_AGENTS:
+ logger.warning(
+ f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}, use `panda` instead."
+ )
+ robot_name = "panda"
+
+ ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
+ backend = parse_sim_and_render_backend(*backend_str)
+ if isinstance(scene, sapien.Scene):
+ scene = ManiSkillScene([scene], device=backend_str[0], backend=backend)
+ robot = ROBOT_CLS(
+ scene=scene,
+ control_freq=control_freq,
+ control_mode=control_mode,
+ initial_pose=pose,
+ )
+
+ # Set robot init joint rad agree(joint0 to joint6 w 2 finger).
+ qpos = np.array(
+ [
+ 0.0,
+ np.pi / 8,
+ 0,
+ -np.pi * 3 / 8,
+ 0,
+ np.pi * 3 / 4,
+ np.pi / 4,
+ 0.04,
+ 0.04,
+ ]
+ )
+ qpos = (
+ np.random.normal(
+ 0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
+ )
+ + qpos
+ )
+ qpos[:, -2:] = 0.04
+ robot.reset(qpos)
+ robot.init_qpos = robot.robot.qpos
+ robot.controller.controllers["gripper"].reset()
+
+ return robot
+
+
+def render_images(
+ camera: sapien.render.RenderCameraComponent,
+ render_keys: list[
+ Literal[
+ "Color",
+ "Segmentation",
+ "Normal",
+ "Mask",
+ "Depth",
+ "Foreground",
+ ]
+ ] = None,
+) -> dict[str, Image.Image]:
+ """Render images from a given sapien camera.
+
+ Args:
+ camera (sapien.render.RenderCameraComponent): The camera to render from.
+ render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
+
+ Returns:
+ Dict[str, Image.Image]: Dictionary of rendered images.
+ """
+ if render_keys is None:
+ render_keys = [
+ "Color",
+ "Segmentation",
+ "Normal",
+ "Mask",
+ "Depth",
+ "Foreground",
+ ]
+
+ results: dict[str, Image.Image] = {}
+ if "Color" in render_keys:
+ color = camera.get_picture("Color")
+ color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
+ results["Color"] = Image.fromarray(color_rgb)
+
+ if "Mask" in render_keys:
+ alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
+ results["Mask"] = Image.fromarray(alpha)
+
+ if "Segmentation" in render_keys:
+ seg_labels = camera.get_picture("Segmentation")
+ label0 = seg_labels[..., 0].astype(np.uint8)
+ seg_color = COLOR_PALETTE[label0]
+ results["Segmentation"] = Image.fromarray(seg_color)
+
+ if "Foreground" in render_keys:
+ seg_labels = camera.get_picture("Segmentation")
+ label0 = seg_labels[..., 0]
+ mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
+ color = camera.get_picture("Color")
+ color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
+ foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
+ results["Foreground"] = Image.fromarray(foreground)
+
+ if "Normal" in render_keys:
+ normal = camera.get_picture("Normal")[..., :3]
+ normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
+ results["Normal"] = Image.fromarray(normal_img)
+
+ if "Depth" in render_keys:
+ position_map = camera.get_picture("Position")
+ depth = -position_map[..., 2]
+ alpha = torch.tensor(color[..., 3], dtype=torch.float32)
+ norm_depth = DiffrastRender.normalize_map_by_mask(
+ torch.tensor(depth), alpha
+ )
+ depth_img = (norm_depth * 255).to(torch.uint8).numpy()
+ results["Depth"] = Image.fromarray(depth_img)
+
+ return results
+
+
class SapienSceneManager:
"""A class to manage SAPIEN simulator."""
@@ -81,11 +339,6 @@ class SapienSceneManager:
sapien.render.set_ray_tracing_path_depth(10)
sapien.render.set_ray_tracing_denoiser("oidn")
- # TODO: Support GPU.
- # import sapien.physx as physx
- # physx.enable_gpu()
- # scene = sapien.Scene(systems=[sapien.physx.PhysxGpuSystem(), sapien.render.RenderSystem()])
-
scene = sapien.Scene()
scene.set_timestep(1 / self.sim_freq)
@@ -114,12 +367,13 @@ class SapienSceneManager:
def step_action(
self,
+ agent: BaseAgent,
action: torch.Tensor,
- sim_steps_per_control: int,
cameras: list[sapien.render.RenderCameraComponent],
render_keys: list[str],
+ sim_steps_per_control: int = 1,
) -> dict:
- self.robot.set_action(action)
+ agent.set_action(action)
frames = defaultdict(list)
for _ in range(sim_steps_per_control):
self.scene.step()
@@ -127,125 +381,11 @@ class SapienSceneManager:
self.scene.update_render()
for camera in cameras:
camera.take_picture()
- images = self.render_images(camera, render_keys=render_keys)
+ images = render_images(camera, render_keys=render_keys)
frames[camera.name].append(images)
return frames
- def load_exists_robot(
- self,
- robot_name: str,
- pose: sapien.Pose,
- control_freq: int = 20,
- robot_init_qpos_noise: float = 0.02,
- control_mode: str = "pd_joint_pos",
- backend_str: tuple[str, str] = ("cpu", "gpu"),
- ):
- from mani_skill.agents import REGISTERED_AGENTS
- from mani_skill.envs.scene import ManiSkillScene
- from mani_skill.envs.utils.system.backend import (
- parse_sim_and_render_backend,
- )
-
- if robot_name == "franka":
- robot_name = "panda" # remapping
- if robot_name not in REGISTERED_AGENTS:
- raise KeyError(
- f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}."
- )
-
- ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
- # backend = parse_sim_and_render_backend("physx_cuda", "gpu")
- backend = parse_sim_and_render_backend(*backend_str)
- scene = ManiSkillScene(
- [self.scene], device=backend_str[0], backend=backend
- )
- robot = ROBOT_CLS(
- scene=scene,
- control_freq=control_freq,
- control_mode=control_mode,
- initial_pose=pose,
- )
- # Set robot init joint rad agree(joint0 to joint6 w 2 finger).
- qpos = np.array(
- [
- 0.0,
- np.pi / 8,
- 0,
- -np.pi * 3 / 8,
- 0,
- np.pi * 3 / 4,
- np.pi / 4,
- 0.04,
- 0.04,
- ]
- )
- qpos = (
- np.random.normal(
- 0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
- )
- + qpos
- )
- qpos[:, -2:] = 0.04
- robot.reset(qpos)
- robot.init_qpos = robot.robot.qpos
- robot.controller.controllers["gripper"].reset()
-
- return robot
-
- def load_actor_from_urdf(
- self,
- file_path: str,
- pose: sapien.Pose,
- use_static: bool = False,
- update_mass: bool = False,
- ) -> sapien.pysapien.Entity:
- """Load an actor from a URDF file into the scene.
-
- Args:
- file_path (str): Path to the URDF file.
- pose (sapien.Pose): Pose of the actor.
- use_static (bool): Whether to create a static actor.
- update_mass (bool): Whether to update the actor's mass from the URDF.
-
- Returns:
- sapien.pysapien.Entity: The loaded actor.
- """
- tree = ET.parse(file_path)
- root = tree.getroot()
- node_name = root.get("name")
- file_dir = os.path.dirname(file_path)
- visual_file = root.find('.//visual/geometry/mesh').get("filename")
- collision_file = root.find('.//collision/geometry/mesh').get(
- "filename"
- )
- visual_file = os.path.join(file_dir, visual_file)
- collision_file = os.path.join(file_dir, collision_file)
- static_fric = root.find('.//collision/gazebo/mu1').text
- dynamic_fric = root.find('.//collision/gazebo/mu2').text
-
- material = self.scene.create_physical_material(
- static_friction=np.clip(float(static_fric), 0.1, 0.7),
- dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
- restitution=0.05,
- )
- builder = self.scene.create_actor_builder()
- body_type = "static" if use_static else "dynamic"
- builder.set_physx_body_type(body_type)
- builder.add_multiple_convex_collisions_from_file(
- collision_file, material=material
- )
- builder.add_visual_from_file(visual_file)
- actor = builder.build(name=node_name)
- actor.set_name(node_name)
- actor.set_pose(pose)
-
- if update_mass and hasattr(actor.components[1], "mass"):
- node_mass = float(root.find('.//inertial/mass').get("value"))
- actor.components[1].set_mass(node_mass)
-
- return actor
-
def create_camera(
self,
cam_name: str,
@@ -344,257 +484,78 @@ class SapienSceneManager:
return self.cameras
- def render_images(
- self,
- camera: sapien.render.RenderCameraComponent,
- render_keys: list[
- Literal[
- "Color",
- "Segmentation",
- "Normal",
- "Mask",
- "Depth",
- "Foreground",
- ]
- ] = None,
- ) -> dict[str, Image.Image]:
- """Render images from a given camera.
-
- Args:
- camera (sapien.render.RenderCameraComponent): The camera to render from.
- render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
-
- Returns:
- Dict[str, Image.Image]: Dictionary of rendered images.
- """
- if render_keys is None:
- render_keys = [
- "Color",
- "Segmentation",
- "Normal",
- "Mask",
- "Depth",
- "Foreground",
- ]
-
- results: dict[str, Image.Image] = {}
- if "Color" in render_keys:
- color = camera.get_picture("Color")
- color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
- results["Color"] = Image.fromarray(color_rgb)
-
- if "Mask" in render_keys:
- alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
- results["Mask"] = Image.fromarray(alpha)
-
- if "Segmentation" in render_keys:
- seg_labels = camera.get_picture("Segmentation")
- label0 = seg_labels[..., 0].astype(np.uint8)
- seg_color = self.COLOR_PALETTE[label0]
- results["Segmentation"] = Image.fromarray(seg_color)
-
- if "Foreground" in render_keys:
- seg_labels = camera.get_picture("Segmentation")
- label0 = seg_labels[..., 0]
- mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
- color = camera.get_picture("Color")
- color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
- foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
- results["Foreground"] = Image.fromarray(foreground)
-
- if "Normal" in render_keys:
- normal = camera.get_picture("Normal")[..., :3]
- normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
- results["Normal"] = Image.fromarray(normal_img)
-
- if "Depth" in render_keys:
- position_map = camera.get_picture("Position")
- depth = -position_map[..., 2]
- alpha = torch.tensor(color[..., 3], dtype=torch.float32)
- norm_depth = DiffrastRender.normalize_map_by_mask(
- torch.tensor(depth), alpha
- )
- depth_img = (norm_depth * 255).to(torch.uint8).numpy()
- results["Depth"] = Image.fromarray(depth_img)
-
- return results
-
- def load_assets_from_layout_file(
- self,
- layout: LayoutInfo,
- z_offset: float = 0.0,
- init_quat: list[float] = [0, 0, 0, 1],
- robot_name: str = None,
- control_freq: int = 40,
- robot_init_qpos_noise: float = 0,
- ) -> None:
- """Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
-
- Args:
- layout (LayoutInfo): The layout information data.
- z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
- init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
- robot_name (str): Name of the robot to load, None not load robot.
- control_freq (int): Control frequency for the robot, default is 40.
- robot_init_qpos_noise (float): Noise to add to the robot's initial joint positions, default is 0.
- """
- for node in layout.assets:
- file_dir = layout.assets[node]
- file_name = f"{node.replace(' ', '_')}.urdf"
- urdf_file = os.path.join(file_dir, file_name)
-
- if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
- continue
-
- position = layout.position[node].copy()
- if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
- position[2] += z_offset
-
- use_static = (
- layout.relation.get(Scene3DItemEnum.CONTEXT.value, None)
- == node
- )
-
- # Combine initial quaternion with object quaternion
- x, y, z, qx, qy, qz, qw = position
- qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
- actor = self.load_actor_from_urdf(
- urdf_file,
- sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
- use_static=use_static,
- update_mass=False,
- )
- self.actors[node] = actor
-
- if robot_name is not None:
- robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
- x, y, z, qx, qy, qz, qw = layout.position[robot_node]
- delta_z = 0.002 # Add small offset to avoid collision.
- pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
- self.robot = self.load_exists_robot(
- robot_name, pose, control_freq, robot_init_qpos_noise
- )
-
- return
-
class FrankaPandaGrasper(object):
- """Grasper planner for Franka Panda robot arm.
-
- Args:
- robot: See from `mani_skill.agents.robots.panda.panda`.
- scene: The `SapienSceneManager` object.
- sim_steps_per_control: Number of simulation steps per control step.
- control_timestep: Time step for planning.
- joint_vel_limits: Velocity limit for all joints (default: 0.9)
- joint_acc_limits: Acceleration limit for all joints (default: 0.9)
- finger_length: Length of the robot's fingers (default: 0.025)
- """
-
def __init__(
self,
- robot,
- scene: SapienSceneManager,
- sim_steps_per_control: int,
- control_timestep: float,
- joint_vel_limits: float = 0.9,
- joint_acc_limits: float = 0.9,
+ agent: BaseAgent,
+ control_freq: float,
+ joint_vel_limits: float = 2.0,
+ joint_acc_limits: float = 1.0,
finger_length: float = 0.025,
) -> None:
- self.agent = robot
- self.robot = robot.robot
- self.scene = scene
- self.sim_steps_per_control = sim_steps_per_control
- self.control_timestep = control_timestep
+ self.agent = agent
+ self.robot = agent.robot
+ self.control_freq = control_freq
+ self.control_timestep = 1 / control_freq
self.joint_vel_limits = joint_vel_limits
self.joint_acc_limits = joint_acc_limits
self.finger_length = finger_length
- self.planner = self._setup_planner()
+ self.planners = self._setup_planner()
def _setup_planner(self) -> mplib.Planner:
- link_names = [link.get_name() for link in self.robot.get_links()]
- joint_names = [
- joint.get_name() for joint in self.robot.get_active_joints()
- ]
-
- planner = mplib.Planner(
- urdf=self.agent.urdf_path,
- srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
- user_link_names=link_names,
- user_joint_names=joint_names,
- move_group="panda_hand_tcp",
- joint_vel_limits=np.ones(7) * self.joint_vel_limits,
- joint_acc_limits=np.ones(7) * self.joint_acc_limits,
- )
- planner.set_base_pose(self.robot.pose.raw_pose[0].tolist())
-
- return planner
-
- def update_state(
- self,
- result: dict,
- gripper_state: Literal[-1, 1],
- sim_steps_per_control: int,
- cameras: list[sapien.render.RenderCameraComponent],
- render_keys: list[str],
- ) -> dict[str, list[Image.Image]]:
- n_step = len(result["position"])
- total_frames = defaultdict(list)
- for i in tqdm(range(n_step), desc="Grasping"):
- qpos = result["position"][min(i, n_step - 1)]
- action = np.hstack([qpos, gripper_state])[None, ...]
- action = torch.from_numpy(action).float()
- frames = self.scene.step_action(
- action, sim_steps_per_control, cameras, render_keys
+ planners = []
+ for pose in self.robot.pose:
+ link_names = [link.get_name() for link in self.robot.get_links()]
+ joint_names = [
+ joint.get_name() for joint in self.robot.get_active_joints()
+ ]
+ planner = mplib.Planner(
+ urdf=self.agent.urdf_path,
+ srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
+ user_link_names=link_names,
+ user_joint_names=joint_names,
+ move_group="panda_hand_tcp",
+ joint_vel_limits=np.ones(7) * self.joint_vel_limits,
+ joint_acc_limits=np.ones(7) * self.joint_acc_limits,
)
- for camera in cameras:
- total_frames[camera.name].extend(frames[camera.name])
+ planner.set_base_pose(pose.raw_pose[0].tolist())
+ planners.append(planner)
- return total_frames
+ return planners
def control_gripper(
self,
- sim_steps_per_control: int,
- cameras: list[sapien.render.RenderCameraComponent],
- render_keys: list[str],
gripper_state: Literal[-1, 1],
n_step: int = 10,
- ) -> dict[str, list[Image.Image]]:
- total_frames = defaultdict(list)
+ ) -> np.ndarray:
qpos = self.robot.get_qpos()[0, :-2].cpu().numpy()
+ actions = []
for _ in range(n_step):
action = np.hstack([qpos, gripper_state])[None, ...]
- action = torch.from_numpy(action).float()
- frames = self.scene.step_action(
- action, sim_steps_per_control, cameras, render_keys
- )
- for camera in cameras:
- total_frames[camera.name].extend(frames[camera.name])
+ actions.append(action)
- return total_frames
+ return np.concatenate(actions, axis=0)
def move_to_pose(
self,
pose: sapien.Pose,
- sim_steps_per_control: int,
control_timestep: float,
- cameras: list[sapien.render.RenderCameraComponent],
- render_keys: list[str],
gripper_state: Literal[-1, 1],
use_point_cloud: bool = False,
n_max_step: int = 100,
- ) -> dict[str, list[Image.Image]]:
- # First try screw motion planning
- result = self.planner.plan_screw(
+ action_key: str = "position",
+ env_idx: int = 0,
+ ) -> np.ndarray:
+ result = self.planners[env_idx].plan_qpos_to_pose(
np.concatenate([pose.p, pose.q]),
self.robot.get_qpos().cpu().numpy()[0],
time_step=control_timestep,
use_point_cloud=use_point_cloud,
)
- # Fallback to standard planning if screw fails
if result["status"] != "Success":
- result = self.planner.plan_qpos_to_pose(
+ result = self.planners[env_idx].plan_screw(
np.concatenate([pose.p, pose.q]),
self.robot.get_qpos().cpu().numpy()[0],
time_step=control_timestep,
@@ -602,44 +563,34 @@ class FrankaPandaGrasper(object):
)
if result["status"] != "Success":
- return defaultdict(list)
+ return
- for key in result:
- if key in ["status", "duration"]:
- continue
- sample_ratio = (len(result[key]) // n_max_step) + 1
- result[key] = result[key][::sample_ratio]
+ sample_ratio = (len(result[action_key]) // n_max_step) + 1
+ result[action_key] = result[action_key][::sample_ratio]
- frames = self.update_state(
- result, gripper_state, sim_steps_per_control, cameras, render_keys
- )
+ n_step = len(result[action_key])
+ actions = []
+ for i in range(n_step):
+ qpos = result[action_key][i]
+ action = np.hstack([qpos, gripper_state])[None, ...]
+ actions.append(action)
- return frames
+ return np.concatenate(actions, axis=0)
- def render_grasp(
+ def compute_grasp_action(
self,
actor: sapien.pysapien.Entity,
- cameras: list[sapien.render.RenderCameraComponent],
- render_keys: list[
- Literal[
- "Color",
- "Foreground",
- "Segmentation",
- "Normal",
- "Mask",
- "Depth",
- ]
- ],
reach_target_only: bool = True,
- ) -> dict[str, list[Image.Image]]:
+ offset: tuple[float, float, float] = [0, 0, -0.05],
+ env_idx: int = 0,
+ ) -> np.ndarray:
physx_rigid = actor.components[1]
mesh = get_component_mesh(physx_rigid, to_world_frame=True)
obb = mesh.bounding_box_oriented
approaching = np.array([0, 0, -1])
+ tcp_pose = self.agent.tcp.pose[env_idx]
target_closing = (
- self.agent.tcp.pose.to_transformation_matrix()[0, :3, 1]
- .cpu()
- .numpy()
+ tcp_pose.to_transformation_matrix()[0, :3, 1].cpu().numpy()
)
grasp_info = compute_grasp_info_by_obb(
obb,
@@ -647,60 +598,46 @@ class FrankaPandaGrasper(object):
target_closing=target_closing,
depth=self.finger_length,
)
- closing, center = grasp_info["closing"], grasp_info["center"]
- raw_tcp_pose = self.agent.tcp.pose.sp
- grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
- reach_pose = grasp_pose * sapien.Pose(p=[0, 0, -0.05])
- grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
- total_frames = defaultdict(list)
- frames_list = []
- reach_frames = self.move_to_pose(
+ closing, center = grasp_info["closing"], grasp_info["center"]
+ raw_tcp_pose = tcp_pose.sp
+ grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
+ reach_pose = grasp_pose * sapien.Pose(p=offset)
+ grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
+ actions = []
+ reach_actions = self.move_to_pose(
reach_pose,
- self.sim_steps_per_control,
self.control_timestep,
- cameras,
- render_keys,
gripper_state=1,
+ env_idx=env_idx,
)
- frames_list.append(reach_frames)
- if len(reach_frames) == 0:
+ actions.append(reach_actions)
+
+ if reach_actions is None:
logger.warning(
f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping."
)
- return total_frames
+ return None
if not reach_target_only:
- grasp_frames = self.move_to_pose(
+ grasp_actions = self.move_to_pose(
grasp_pose,
- self.sim_steps_per_control,
self.control_timestep,
- cameras,
- render_keys,
gripper_state=1,
+ env_idx=env_idx,
)
- frames_list.append(grasp_frames)
- close_frames = self.control_gripper(
- self.sim_steps_per_control,
- cameras,
- render_keys,
+ actions.append(grasp_actions)
+ close_actions = self.control_gripper(
gripper_state=-1,
+ env_idx=env_idx,
)
- frames_list.append(close_frames)
- back_frames = self.move_to_pose(
+ actions.append(close_actions)
+ back_actions = self.move_to_pose(
raw_tcp_pose,
- self.sim_steps_per_control,
self.control_timestep,
- cameras,
- render_keys,
gripper_state=-1,
+ env_idx=env_idx,
)
- frames_list.append(back_frames)
+ actions.append(back_actions)
- for frame_dict in frames_list:
- for cam_name, frames in frame_dict.items():
- total_frames[cam_name].extend(frames)
-
- self.agent.reset(self.agent.init_qpos)
-
- return total_frames
+ return np.concatenate(actions, axis=0)
diff --git a/embodied_gen/utils/tags.py b/embodied_gen/utils/tags.py
index 35b2387..4e6a7d5 100644
--- a/embodied_gen/utils/tags.py
+++ b/embodied_gen/utils/tags.py
@@ -1 +1 @@
-VERSION = "v0.1.3"
+VERSION = "v0.1.4"
diff --git a/embodied_gen/validators/urdf_convertor.py b/embodied_gen/validators/urdf_convertor.py
index 8940ed8..0fa9461 100644
--- a/embodied_gen/validators/urdf_convertor.py
+++ b/embodied_gen/validators/urdf_convertor.py
@@ -24,6 +24,7 @@ from xml.dom.minidom import parseString
import numpy as np
import trimesh
+from embodied_gen.data.convex_decomposer import decompose_convex_mesh
from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient
from embodied_gen.utils.process_media import render_asset3d
from embodied_gen.utils.tags import VERSION
@@ -84,6 +85,7 @@ class URDFGenerator(object):
attrs_name: list[str] = None,
render_dir: str = "urdf_renders",
render_view_num: int = 4,
+ decompose_convex: bool = False,
) -> None:
if mesh_file_list is None:
mesh_file_list = []
@@ -132,8 +134,9 @@ class URDFGenerator(object):
Estimate the vertical projection of their real length based on its pose.
For example:
- A pen standing upright in the first image (aligned with the image's vertical axis)
- full body visible in the first image: → vertical height ≈ 0.14-0.20 m
- - A pen lying flat in the first image (showing thickness or as a dot) → vertical height ≈ 0.018-0.025 m
+ full body visible in the first image: → vertical height ≈ 0.14-0.20 m
+ - A pen lying flat in the first image or either the tip or the tail is facing the image
+ (showing thickness or as a circle) → vertical height ≈ 0.018-0.025 m
- Tilted pen in the first image (e.g., ~45° angle): vertical height ≈ 0.07-0.12 m
- Use the rest views to help determine the object's 3D pose and orientation.
Assume the object is in real-world scale and estimate the approximate vertical height
@@ -156,6 +159,7 @@ class URDFGenerator(object):
"gs_model",
]
self.attrs_name = attrs_name
+ self.decompose_convex = decompose_convex
def parse_response(self, response: str) -> dict[str, any]:
lines = response.split("\n")
@@ -258,9 +262,24 @@ class URDFGenerator(object):
# Update collision geometry
collision = link.find("collision/geometry/mesh")
if collision is not None:
- collision.set(
- "filename", os.path.join(self.output_mesh_dir, obj_name)
- )
+ collision_mesh = os.path.join(self.output_mesh_dir, obj_name)
+ if self.decompose_convex:
+ try:
+ d_params = dict(
+ threshold=0.05, max_convex_hull=64, verbose=False
+ )
+ filename = f"{os.path.splitext(obj_name)[0]}_collision.ply"
+ output_path = os.path.join(mesh_folder, filename)
+ decompose_convex_mesh(
+ mesh_output_path, output_path, **d_params
+ )
+ collision_mesh = f"{self.output_mesh_dir}/{filename}"
+ except Exception as e:
+ logger.warning(
+ f"Convex decomposition failed for {output_path}, {e}."
+ "Use original mesh for collision computation."
+ )
+ collision.set("filename", collision_mesh)
collision.set("scale", "1.0 1.0 1.0")
# Update friction coefficients
diff --git a/install/install_basic.sh b/install/install_basic.sh
index dc90446..a81c836 100644
--- a/install/install_basic.sh
+++ b/install/install_basic.sh
@@ -9,10 +9,10 @@ PIP_INSTALL_PACKAGES=(
"xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118"
"flash-attn==2.7.0.post2 --no-build-isolation"
"-r requirements.txt --use-deprecated=legacy-resolver"
- "utils3d@git+https://github.com/EasternJournalist/utils3d.git#egg=9a4eb15"
+ "utils3d@git+https://github.com/EasternJournalist/utils3d.git@9a4eb15"
"clip@git+https://github.com/openai/CLIP.git"
- "segment-anything@git+https://github.com/facebookresearch/segment-anything.git#egg=dca509f"
- "nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git#egg=729261d"
+ "segment-anything@git+https://github.com/facebookresearch/segment-anything.git@dca509f"
+ "nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git@729261d"
"kolors@git+https://github.com/HochCC/Kolors.git"
"kaolin@git+https://github.com/NVIDIAGameWorks/kaolin.git@v0.16.0"
"git+https://github.com/nerfstudio-project/gsplat.git@v1.5.3"
diff --git a/pyproject.toml b/pyproject.toml
index cf14b78..12b7da5 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -7,7 +7,7 @@ packages = ["embodied_gen"]
[project]
name = "embodied_gen"
-version = "v0.1.3"
+version = "v0.1.4"
readme = "README.md"
license = "Apache-2.0"
license-files = ["LICENSE", "NOTICE"]
diff --git a/requirements.txt b/requirements.txt
index 3fa2137..71507fe 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -38,5 +38,6 @@ tyro
pyquaternion
shapely
sapien==3.0.0b1
+coacd
mani_skill==3.0.0b21
typing_extensions==4.14.1
diff --git a/tests/test_examples/test_urdf_convertor.py b/tests/test_examples/test_urdf_convertor.py
index 10f4dab..3795e3b 100644
--- a/tests/test_examples/test_urdf_convertor.py
+++ b/tests/test_examples/test_urdf_convertor.py
@@ -39,6 +39,7 @@ def test_urdf_convertor():
"outputs/test_urdf/notebook/result/mesh/notebook.obj",
"outputs/test_urdf/marker/result/mesh/marker.obj",
"outputs/test_urdf/pen2/result/mesh/pen.obj",
+ "outputs/test_urdf/pen3/result/mesh/pen.obj",
]
for idx, mesh_path in enumerate(mesh_paths):
filename = mesh_path.split("/")[-1].split(".")[0]
@@ -49,3 +50,19 @@ def test_urdf_convertor():
# min_height=1.0,
# max_height=1.2,
)
+
+
+def test_decompose_convex_mesh():
+ urdf_gen = URDFGenerator(GPT_CLIENT, decompose_convex=True)
+ mesh_paths = [
+ "outputs/test_urdf/sample_0/mesh/pen.obj",
+ "outputs/test_urdf/sample_1/mesh/notepad.obj",
+ "outputs/test_urdf/sample_2/mesh/plate.obj",
+ ]
+ for idx, mesh_path in enumerate(mesh_paths):
+ filename = mesh_path.split("/")[-1].split(".")[0]
+ urdf_path = urdf_gen(
+ mesh_path=mesh_path,
+ output_root=f"outputs/test_urdf3/sample_{idx}",
+ category=filename,
+ )
diff --git a/tests/test_unit/test_mesh_process.py b/tests/test_unit/test_mesh_process.py
new file mode 100644
index 0000000..7a9f982
--- /dev/null
+++ b/tests/test_unit/test_mesh_process.py
@@ -0,0 +1,44 @@
+# Project EmbodiedGen
+#
+# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+# implied. See the License for the specific language governing
+# permissions and limitations under the License.
+
+
+import logging
+import tempfile
+from time import time
+
+import pytest
+from embodied_gen.data.convex_decomposer import decompose_convex_mesh
+
+logger = logging.getLogger(__name__)
+
+
+@pytest.mark.parametrize(
+ "input_mesh_path, max_convex_hull",
+ [
+ ("apps/assets/example_texture/meshes/robot_text.obj", 8),
+ # ("apps/assets/example_texture/meshes/robot_text.obj", 32),
+ # ("apps/assets/example_texture/meshes/robot_text.obj", 64),
+ # ("apps/assets/example_texture/meshes/robot_text.obj", 128),
+ ],
+)
+def test_decompose_convex_mesh(input_mesh_path, max_convex_hull):
+ d_params = dict(
+ threshold=0.05, max_convex_hull=max_convex_hull, verbose=False
+ )
+ with tempfile.NamedTemporaryFile(suffix='.ply', delete=True) as tmp_file:
+ start_time = time()
+ decompose_convex_mesh(input_mesh_path, tmp_file.name, **d_params)
+ logger.info(f"Finished in {time()-start_time:.2f}s")