diff --git a/CHANGELOG.md b/CHANGELOG.md index 2a39367..36badac 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0). +## [0.1.4] - 2025-08-29 +### Feature +- Supports importing the output of the EmbodiedGen Layout Generation module with a single `gym.make` command to create multiple parallel simulation environments and interactive 3D worlds. +- Supports coacd-based mesh decomposition for better efficiency and accuracy in physical simulations. + ## [0.1.3] - 2025-08-19 ### Feature - Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description. diff --git a/README.md b/README.md index f501f3f..3099ddd 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ ```sh git clone https://github.com/HorizonRobotics/EmbodiedGen.git cd EmbodiedGen -git checkout v0.1.3 +git checkout v0.1.4 git submodule update --init --recursive --progress conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env. conda activate embodiedgen @@ -67,7 +67,8 @@ You can choose between two backends for the GPT agent: [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Image_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems. (HF space is a simplified demonstration. For the full functionality, please refer to `img3d-cli`.) -Image to 3D + +Image to 3D ### ☁️ Service Run the image-to-3D generation service locally. @@ -95,7 +96,7 @@ img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/examp [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Text_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to `text3d-cli`.) -Text to 3D +Text to 3D ### ☁️ Service Deploy the text-to-3D generation service locally. @@ -133,7 +134,7 @@ ps: models with more permissive licenses found in `embodied_gen/models/image_com [![🤗 Hugging Face](https://img.shields.io/badge/🤗-Texture_Gen_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh. -Texture Gen +Texture Gen ### ☁️ Service @@ -161,7 +162,7 @@ bash embodied_gen/scripts/texture_gen.sh \

🌍 3D Scene Generation

-scene3d +scene3d ### ⚡ API > Run `bash install.sh extra` to install additional requirements if you need to use `scene3d-cli`. @@ -184,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \ 🚧 *Coming Soon* -articulate +articulate --- @@ -196,12 +197,12 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \ - - + + - - + +
layout1layout2layout1layout2
layout3layout4layout3layout4
@@ -219,8 +220,8 @@ layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on - - + +
Iscene_demo1Iscene_demo2Iscene_demo1Iscene_demo2
@@ -233,7 +234,7 @@ CUDA_VISIBLE_DEVICES=0 nohup layout-cli \ --output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log & ``` -Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.) +Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. ```sh python embodied_gen/scripts/compose_layout.py \ --layout_path "outputs/layouts_gens/task_0000/layout.json" \ @@ -243,13 +244,29 @@ python embodied_gen/scripts/compose_layout.py \ We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates). ```sh -sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \ ---output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka" +sim-cli --layout_path "outputs/layouts_gen/task_0000/layout.json" \ +--output_dir "outputs/layouts_gen/task_0000/sapien_render" --insert_robot +``` + +Example: generate multiple parallel simulation envs with `gym.make` and record sensor and trajectory data. + + + + + + +
parallel_sim1parallel_sim2
+ +```sh +python embodied_gen/scripts/parallel_sim.py \ +--layout_file "outputs/layouts_gen/task_0000/layout.json" \ +--output_dir "outputs/parallel_sim/task_0000" \ +--num_envs 20 ``` ### 🖼️ Real-to-Sim Digital Twin -real2sim_mujoco +real2sim_mujoco --- @@ -280,7 +297,7 @@ If you use EmbodiedGen in your research or projects, please cite: ## 🙌 Acknowledgement EmbodiedGen builds upon the following amazing projects and models: -🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) +🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 🌟 [ManiSkill](https://github.com/haosulab/ManiSkill) --- diff --git a/apps/assets/parallel_sim.gif b/apps/assets/parallel_sim.gif new file mode 100644 index 0000000..dd4aea0 Binary files /dev/null and b/apps/assets/parallel_sim.gif differ diff --git a/apps/assets/parallel_sim2.gif b/apps/assets/parallel_sim2.gif new file mode 100644 index 0000000..10665f8 Binary files /dev/null and b/apps/assets/parallel_sim2.gif differ diff --git a/apps/common.py b/apps/common.py index e0e71bf..643151b 100644 --- a/apps/common.py +++ b/apps/common.py @@ -547,7 +547,9 @@ def extract_urdf( # Convert to URDF and recover attrs by GPT. filename = "sample" - urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4) + urdf_convertor = URDFGenerator( + GPT_CLIENT, render_view_num=4, decompose_convex=True + ) asset_attrs = { "version": VERSION, "gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply", diff --git a/embodied_gen/data/convex_decomposer.py b/embodied_gen/data/convex_decomposer.py new file mode 100644 index 0000000..4fd2490 --- /dev/null +++ b/embodied_gen/data/convex_decomposer.py @@ -0,0 +1,161 @@ +# Project EmbodiedGen +# +# Copyright (c) 2025 Horizon Robotics. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +# implied. See the License for the specific language governing +# permissions and limitations under the License. + +import logging +import multiprocessing as mp +import os + +import coacd +import trimesh + +logger = logging.getLogger(__name__) + +__all__ = [ + "decompose_convex_coacd", + "decompose_convex_mesh", + "decompose_convex_process", +] + + +def decompose_convex_coacd( + filename: str, outfile: str, params: dict, verbose: bool = False +) -> None: + coacd.set_log_level("info" if verbose else "warn") + + mesh = trimesh.load(filename, force="mesh") + mesh = coacd.Mesh(mesh.vertices, mesh.faces) + + result = coacd.run_coacd(mesh, **params) + combined = sum([trimesh.Trimesh(*m) for m in result]) + combined.export(outfile) + + +def decompose_convex_mesh( + filename: str, + outfile: str, + threshold: float = 0.05, + max_convex_hull: int = -1, + preprocess_mode: str = "auto", + preprocess_resolution: int = 30, + resolution: int = 2000, + mcts_nodes: int = 20, + mcts_iterations: int = 150, + mcts_max_depth: int = 3, + pca: bool = False, + merge: bool = True, + seed: int = 0, + verbose: bool = False, +) -> str: + """Decompose a mesh into convex parts using the CoACD algorithm.""" + coacd.set_log_level("info" if verbose else "warn") + + if os.path.exists(outfile): + logger.warning(f"Output file {outfile} already exists, removing it.") + os.remove(outfile) + + params = dict( + threshold=threshold, + max_convex_hull=max_convex_hull, + preprocess_mode=preprocess_mode, + preprocess_resolution=preprocess_resolution, + resolution=resolution, + mcts_nodes=mcts_nodes, + mcts_iterations=mcts_iterations, + mcts_max_depth=mcts_max_depth, + pca=pca, + merge=merge, + seed=seed, + ) + + try: + decompose_convex_coacd(filename, outfile, params, verbose) + if os.path.exists(outfile): + return outfile + except Exception as e: + if verbose: + print(f"Decompose convex first attempt failed: {e}.") + + if preprocess_mode != "on": + try: + params["preprocess_mode"] = "on" + decompose_convex_coacd(filename, outfile, params, verbose) + if os.path.exists(outfile): + return outfile + except Exception as e: + if verbose: + print( + f"Decompose convex second attempt with preprocess_mode='on' failed: {e}" + ) + + raise RuntimeError(f"Convex decomposition failed on {filename}") + + +def decompose_convex_mp( + filename: str, + outfile: str, + threshold: float = 0.05, + max_convex_hull: int = -1, + preprocess_mode: str = "auto", + preprocess_resolution: int = 30, + resolution: int = 2000, + mcts_nodes: int = 20, + mcts_iterations: int = 150, + mcts_max_depth: int = 3, + pca: bool = False, + merge: bool = True, + seed: int = 0, + verbose: bool = False, +) -> str: + """Decompose a mesh into convex parts using the CoACD algorithm in a separate process. + + See https://simulately.wiki/docs/toolkits/ConvexDecomp for details. + """ + params = dict( + threshold=threshold, + max_convex_hull=max_convex_hull, + preprocess_mode=preprocess_mode, + preprocess_resolution=preprocess_resolution, + resolution=resolution, + mcts_nodes=mcts_nodes, + mcts_iterations=mcts_iterations, + mcts_max_depth=mcts_max_depth, + pca=pca, + merge=merge, + seed=seed, + ) + + ctx = mp.get_context("spawn") + p = ctx.Process( + target=decompose_convex_coacd, + args=(filename, outfile, params, verbose), + ) + p.start() + p.join() + if p.exitcode == 0 and os.path.exists(outfile): + return outfile + + if preprocess_mode != "on": + params["preprocess_mode"] = "on" + p = ctx.Process( + target=decompose_convex_coacd, + args=(filename, outfile, params, verbose), + ) + p.start() + p.join() + if p.exitcode == 0 and os.path.exists(outfile): + return outfile + + raise RuntimeError(f"Convex decomposition failed on {filename}") diff --git a/embodied_gen/data/mesh_operator.py b/embodied_gen/data/mesh_operator.py index 38f0563..8d41ba0 100644 --- a/embodied_gen/data/mesh_operator.py +++ b/embodied_gen/data/mesh_operator.py @@ -16,13 +16,17 @@ import logging +import multiprocessing as mp +import os from typing import Tuple, Union +import coacd import igraph import numpy as np import pyvista as pv import spaces import torch +import trimesh import utils3d from pymeshfix import _meshfix from tqdm import tqdm @@ -33,7 +37,9 @@ logging.basicConfig( logger = logging.getLogger(__name__) -__all__ = ["MeshFixer"] +__all__ = [ + "MeshFixer", +] def _radical_inverse(base, n): diff --git a/embodied_gen/envs/pick_embodiedgen.py b/embodied_gen/envs/pick_embodiedgen.py new file mode 100644 index 0000000..9b14070 --- /dev/null +++ b/embodied_gen/envs/pick_embodiedgen.py @@ -0,0 +1,389 @@ +# Project EmbodiedGen +# +# Copyright (c) 2025 Horizon Robotics. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +# implied. See the License for the specific language governing +# permissions and limitations under the License. + +import json +import os +from copy import deepcopy + +import numpy as np +import sapien +import torch +import torchvision.transforms as transforms +from mani_skill.envs.sapien_env import BaseEnv +from mani_skill.sensors.camera import CameraConfig +from mani_skill.utils import sapien_utils +from mani_skill.utils.building import actors +from mani_skill.utils.registration import register_env +from mani_skill.utils.structs.actor import Actor +from mani_skill.utils.structs.pose import Pose +from mani_skill.utils.structs.types import ( + GPUMemoryConfig, + SceneConfig, + SimConfig, +) +from mani_skill.utils.visualization.misc import tile_images +from tqdm import tqdm +from embodied_gen.models.gs_model import GaussianOperator +from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum +from embodied_gen.utils.geometry import bfs_placement, quaternion_multiply +from embodied_gen.utils.log import logger +from embodied_gen.utils.process_media import alpha_blend_rgba +from embodied_gen.utils.simulation import ( + SIM_COORD_ALIGN, + load_assets_from_layout_file, +) + +__all__ = ["PickEmbodiedGen"] + + +@register_env("PickEmbodiedGen-v1", max_episode_steps=100) +class PickEmbodiedGen(BaseEnv): + SUPPORTED_ROBOTS = ["panda", "panda_wristcam", "fetch"] + goal_thresh = 0.0 + + def __init__( + self, + *args, + robot_uids: str | list[str] = "panda", + robot_init_qpos_noise: float = 0.02, + num_envs: int = 1, + reconfiguration_freq: int = None, + **kwargs, + ): + self.robot_init_qpos_noise = robot_init_qpos_noise + if reconfiguration_freq is None: + if num_envs == 1: + reconfiguration_freq = 1 + else: + reconfiguration_freq = 0 + + # Init params from kwargs. + layout_file = kwargs.pop("layout_file", None) + replace_objs = kwargs.pop("replace_objs", True) + self.enable_grasp = kwargs.pop("enable_grasp", False) + self.init_quat = kwargs.pop("init_quat", [0.7071, 0, 0, 0.7071]) + # Add small offset in z-axis to avoid collision. + self.objs_z_offset = kwargs.pop("objs_z_offset", 0.002) + self.robot_z_offset = kwargs.pop("robot_z_offset", 0.002) + + self.layouts = self.init_env_layouts( + layout_file, num_envs, replace_objs + ) + self.robot_pose = self.compute_robot_init_pose( + self.layouts, num_envs, self.robot_z_offset + ) + self.env_actors = dict() + self.image_transform = transforms.PILToTensor() + + super().__init__( + *args, + robot_uids=robot_uids, + reconfiguration_freq=reconfiguration_freq, + num_envs=num_envs, + **kwargs, + ) + + self.bg_images = dict() + if self.render_mode == "hybrid": + self.bg_images = self.render_gs3d_images( + self.layouts, num_envs, self.init_quat + ) + + @staticmethod + def init_env_layouts( + layout_file: str, num_envs: int, replace_objs: bool + ) -> list[LayoutInfo]: + layout = LayoutInfo.from_dict(json.load(open(layout_file, "r"))) + layouts = [] + for env_idx in range(num_envs): + if replace_objs and env_idx > 0: + layout = bfs_placement(deepcopy(layout)) + layouts.append(layout) + + return layouts + + @staticmethod + def compute_robot_init_pose( + layouts: list[LayoutInfo], num_envs: int, z_offset: float = 0.0 + ) -> list[list[float]]: + robot_pose = [] + for env_idx in range(num_envs): + layout = layouts[env_idx] + robot_node = layout.relation[Scene3DItemEnum.ROBOT.value] + x, y, z, qx, qy, qz, qw = layout.position[robot_node] + robot_pose.append([x, y, z + z_offset, qw, qx, qy, qz]) + + return robot_pose + + @property + def _default_sim_config(self): + return SimConfig( + scene_config=SceneConfig( + solver_position_iterations=30, + # contact_offset=0.04, + # rest_offset=0.001, + ), + # sim_freq=200, + control_freq=50, + gpu_memory_config=GPUMemoryConfig( + max_rigid_contact_count=2**20, max_rigid_patch_count=2**19 + ), + ) + + @property + def _default_sensor_configs(self): + pose = sapien_utils.look_at(eye=[0.3, 0, 0.6], target=[-0.1, 0, 0.1]) + + return [ + CameraConfig("base_camera", pose, 128, 128, np.pi / 2, 0.01, 100) + ] + + @property + def _default_human_render_camera_configs(self): + pose = sapien_utils.look_at( + eye=[0.9, 0.0, 1.1], target=[0.0, 0.0, 0.9] + ) + + return CameraConfig( + "render_camera", pose, 256, 256, np.deg2rad(75), 0.01, 100 + ) + + def _load_agent(self, options: dict): + super()._load_agent(options, sapien.Pose(p=[-10, 0, 10])) + + def _load_scene(self, options: dict): + all_objects = [] + logger.info(f"Loading assets and decomposition mesh collisions...") + for env_idx in range(self.num_envs): + env_actors = load_assets_from_layout_file( + self.scene, + self.layouts[env_idx], + z_offset=self.objs_z_offset, + init_quat=self.init_quat, + env_idx=env_idx, + ) + self.env_actors[f"env{env_idx}"] = env_actors + all_objects.extend(env_actors.values()) + + self.obj = all_objects[-1] + for obj in all_objects: + self.remove_from_state_dict_registry(obj) + + self.all_objects = Actor.merge(all_objects, name="all_objects") + self.add_to_state_dict_registry(self.all_objects) + + self.goal_site = actors.build_sphere( + self.scene, + radius=self.goal_thresh, + color=[0, 1, 0, 0], + name="goal_site", + body_type="kinematic", + add_collision=False, + initial_pose=sapien.Pose(), + ) + self._hidden_objects.append(self.goal_site) + + def _initialize_episode(self, env_idx: torch.Tensor, options: dict): + with torch.device(self.device): + b = len(env_idx) + goal_xyz = torch.zeros((b, 3)) + goal_xyz[:, :2] = torch.rand((b, 2)) * 0.2 - 0.1 + self.goal_site.set_pose(Pose.create_from_pq(goal_xyz)) + + qpos = np.array( + [ + 0.0, + np.pi / 8, + 0, + -np.pi * 3 / 8, + 0, + np.pi * 3 / 4, + np.pi / 4, + 0.04, + 0.04, + ] + ) + qpos = ( + np.random.normal( + 0, self.robot_init_qpos_noise, (self.num_envs, len(qpos)) + ) + + qpos + ) + qpos[:, -2:] = 0.04 + self.agent.robot.set_root_pose(np.array(self.robot_pose)) + self.agent.reset(qpos) + self.agent.init_qpos = qpos + self.agent.controller.controllers["gripper"].reset() + + def render_gs3d_images( + self, layouts: list[LayoutInfo], num_envs: int, init_quat: list[float] + ) -> dict[str, np.ndarray]: + sim_coord_align = ( + torch.tensor(SIM_COORD_ALIGN).to(torch.float32).to(self.device) + ) + cameras = self.scene.sensors.copy() + cameras.update(self.scene.human_render_cameras) + + bg_node = layouts[0].relation[Scene3DItemEnum.BACKGROUND.value] + gs_path = os.path.join(layouts[0].assets[bg_node], "gs_model.ply") + raw_gs: GaussianOperator = GaussianOperator.load_from_ply(gs_path) + bg_images = dict() + for env_idx in tqdm(range(num_envs), desc="Pre-rendering Background"): + layout = layouts[env_idx] + x, y, z, qx, qy, qz, qw = layout.position[bg_node] + qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat) + init_pose = torch.tensor([x, y, z, qx, qy, qz, qw]) + gs_model = raw_gs.get_gaussians(instance_pose=init_pose) + for key in cameras: + camera = cameras[key] + Ks = camera.camera.get_intrinsic_matrix() # (n_env, 3, 3) + c2w = camera.camera.get_model_matrix() # (n_env, 4, 4) + result = gs_model.render( + c2w[env_idx] @ sim_coord_align, + Ks[env_idx], + image_width=camera.config.width, + image_height=camera.config.height, + ) + bg_images[f"{key}-env{env_idx}"] = result.rgb[..., ::-1] + + return bg_images + + def render(self): + if self.render_mode is None: + raise RuntimeError("render_mode is not set.") + if self.render_mode == "human": + return self.render_human() + elif self.render_mode == "rgb_array": + res = self.render_rgb_array() + return res + elif self.render_mode == "sensors": + res = self.render_sensors() + return res + elif self.render_mode == "all": + return self.render_all() + elif self.render_mode == "hybrid": + return self.hybrid_render() + else: + raise NotImplementedError( + f"Unsupported render mode {self.render_mode}." + ) + + def render_rgb_array( + self, camera_name: str = None, return_alpha: bool = False + ): + for obj in self._hidden_objects: + obj.show_visual() + self.scene.update_render( + update_sensors=False, update_human_render_cameras=True + ) + images = [] + render_images = self.scene.get_human_render_camera_images( + camera_name, return_alpha + ) + for image in render_images.values(): + images.append(image) + if len(images) == 0: + return None + if len(images) == 1: + return images[0] + for obj in self._hidden_objects: + obj.hide_visual() + return tile_images(images) + + def render_sensors(self): + images = [] + sensor_images = self.get_sensor_images() + for image in sensor_images.values(): + for img in image.values(): + images.append(img) + return tile_images(images) + + def hybrid_render(self): + fg_images = self.render_rgb_array( + return_alpha=True + ) # (n_env, h, w, 3) + images = [] + for key in self.bg_images: + if "render_camera" not in key: + continue + env_idx = int(key.split("-env")[-1]) + rgba = alpha_blend_rgba( + fg_images[env_idx].cpu().numpy(), self.bg_images[key] + ) + images.append(self.image_transform(rgba)) + + images = torch.stack(images, dim=0) + images = images.permute(0, 2, 3, 1) + + return images[..., :3] + + def evaluate(self): + obj_to_goal_pos = ( + self.obj.pose.p + ) # self.goal_site.pose.p - self.obj.pose.p + is_obj_placed = ( + torch.linalg.norm(obj_to_goal_pos, axis=1) <= self.goal_thresh + ) + is_grasped = self.agent.is_grasping(self.obj) + is_robot_static = self.agent.is_static(0.2) + + return dict( + is_grasped=is_grasped, + obj_to_goal_pos=obj_to_goal_pos, + is_obj_placed=is_obj_placed, + is_robot_static=is_robot_static, + is_grasping=self.agent.is_grasping(self.obj), + success=torch.logical_and(is_obj_placed, is_robot_static), + ) + + def _get_obs_extra(self, info: dict): + + return dict() + + def compute_dense_reward(self, obs: any, action: torch.Tensor, info: dict): + tcp_to_obj_dist = torch.linalg.norm( + self.obj.pose.p - self.agent.tcp.pose.p, axis=1 + ) + reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist) + reward = reaching_reward + + is_grasped = info["is_grasped"] + reward += is_grasped + + # obj_to_goal_dist = torch.linalg.norm( + # self.goal_site.pose.p - self.obj.pose.p, axis=1 + # ) + obj_to_goal_dist = torch.linalg.norm( + self.obj.pose.p - self.obj.pose.p, axis=1 + ) + place_reward = 1 - torch.tanh(5 * obj_to_goal_dist) + reward += place_reward * is_grasped + + reward += info["is_obj_placed"] * is_grasped + + static_reward = 1 - torch.tanh( + 5 + * torch.linalg.norm(self.agent.robot.get_qvel()[..., :-2], axis=1) + ) + reward += static_reward * info["is_obj_placed"] * is_grasped + + reward[info["success"]] = 6 + return reward + + def compute_normalized_dense_reward( + self, obs: any, action: torch.Tensor, info: dict + ): + return self.compute_dense_reward(obs=obs, action=action, info=info) / 6 diff --git a/embodied_gen/scripts/gen_layout.py b/embodied_gen/scripts/gen_layout.py index e4255f1..5df6973 100644 --- a/embodied_gen/scripts/gen_layout.py +++ b/embodied_gen/scripts/gen_layout.py @@ -137,7 +137,7 @@ def entrypoint() -> None: sim_cli( layout_path=layout_path, output_dir=output_root, - robot_name="franka" if args.insert_robot else None, + insert_robot=args.insert_robot, ) torch.cuda.empty_cache() diff --git a/embodied_gen/scripts/imageto3d.py b/embodied_gen/scripts/imageto3d.py index bf86534..14d3202 100644 --- a/embodied_gen/scripts/imageto3d.py +++ b/embodied_gen/scripts/imageto3d.py @@ -58,7 +58,7 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "false" os.environ["SPCONV_ALGO"] = "native" random.seed(0) -logger.info("Loading Models...") +logger.info("Loading Image3D Models...") DELIGHT = DelightingModel() IMAGESR_MODEL = ImageRealESRGAN(outscale=4) RBG_REMOVER = RembgRemover() @@ -107,6 +107,7 @@ def parse_args(): type=int, default=2, ) + parser.add_argument("--disable_decompose_convex", action="store_true") args, unknown = parser.parse_known_args() return args @@ -251,7 +252,11 @@ def entrypoint(**kwargs): mesh_glb_path = os.path.join(output_root, f"{filename}.glb") mesh.export(mesh_glb_path) - urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4) + urdf_convertor = URDFGenerator( + GPT_CLIENT, + render_view_num=4, + decompose_convex=not args.disable_decompose_convex, + ) asset_attrs = { "version": VERSION, "gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply", diff --git a/embodied_gen/scripts/parallel_sim.py b/embodied_gen/scripts/parallel_sim.py new file mode 100644 index 0000000..91d3fed --- /dev/null +++ b/embodied_gen/scripts/parallel_sim.py @@ -0,0 +1,148 @@ +# Project EmbodiedGen +# +# Copyright (c) 2025 Horizon Robotics. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +# implied. See the License for the specific language governing +# permissions and limitations under the License. + + +from embodied_gen.utils.monkey_patches import monkey_patch_maniskill + +monkey_patch_maniskill() +import json +from collections import defaultdict +from dataclasses import dataclass +from typing import Literal + +import gymnasium as gym +import numpy as np +import torch +import tyro +from mani_skill.utils.wrappers import RecordEpisode +from tqdm import tqdm +import embodied_gen.envs.pick_embodiedgen +from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum +from embodied_gen.utils.log import logger +from embodied_gen.utils.simulation import FrankaPandaGrasper + + +@dataclass +class ParallelSimConfig: + """CLI parameters for Parallel Sapien simulation.""" + + # Environment configuration + layout_file: str + """Path to the layout JSON file""" + output_dir: str + """Directory to save recorded videos""" + gym_env_name: str = "PickEmbodiedGen-v1" + """Name of the Gym environment to use""" + num_envs: int = 4 + """Number of parallel environments""" + render_mode: Literal["rgb_array", "hybrid"] = "hybrid" + """Rendering mode: rgb_array or hybrid""" + enable_shadow: bool = True + """Whether to enable shadows in rendering""" + control_mode: str = "pd_joint_pos" + """Control mode for the agent""" + + # Recording configuration + max_steps_per_video: int = 1000 + """Maximum steps to record per video""" + save_trajectory: bool = False + """Whether to save trajectory data""" + + # Simulation parameters + seed: int = 0 + """Random seed for environment reset""" + warmup_steps: int = 50 + """Number of warmup steps before action computation""" + reach_target_only: bool = True + """Whether to only reach target without full action""" + + +def entrypoint(**kwargs): + if kwargs is None or len(kwargs) == 0: + cfg = tyro.cli(ParallelSimConfig) + else: + cfg = ParallelSimConfig(**kwargs) + + env = gym.make( + cfg.gym_env_name, + num_envs=cfg.num_envs, + render_mode=cfg.render_mode, + enable_shadow=cfg.enable_shadow, + layout_file=cfg.layout_file, + control_mode=cfg.control_mode, + ) + env = RecordEpisode( + env, + cfg.output_dir, + max_steps_per_video=cfg.max_steps_per_video, + save_trajectory=cfg.save_trajectory, + ) + env.reset(seed=cfg.seed) + + default_action = env.unwrapped.agent.init_qpos[:, :8] + for _ in tqdm(range(cfg.warmup_steps), desc="SIM Warmup"): + # action = env.action_space.sample() # Random action + obs, reward, terminated, truncated, info = env.step(default_action) + + grasper = FrankaPandaGrasper( + env.unwrapped.agent, + env.unwrapped.sim_config.control_freq, + ) + + layout_data = LayoutInfo.from_dict(json.load(open(cfg.layout_file, "r"))) + actions = defaultdict(list) + # Plan Grasp reach pose for each manipulated object in each env. + for env_idx in range(env.num_envs): + actors = env.unwrapped.env_actors[f"env{env_idx}"] + for node in layout_data.relation[ + Scene3DItemEnum.MANIPULATED_OBJS.value + ]: + action = grasper.compute_grasp_action( + actor=actors[node]._objs[0], + reach_target_only=True, + env_idx=env_idx, + ) + actions[node].append(action) + + # Excute the planned actions for each manipulated object in each env. + for node in actions: + max_env_steps = 0 + for env_idx in range(env.num_envs): + if actions[node][env_idx] is None: + continue + max_env_steps = max(max_env_steps, len(actions[node][env_idx])) + + action_tensor = np.ones( + (max_env_steps, env.num_envs, env.action_space.shape[-1]) + ) + action_tensor *= default_action[None, ...] + for env_idx in range(env.num_envs): + action = actions[node][env_idx] + if action is None: + continue + action_tensor[: len(action), env_idx, :] = action + + for step in tqdm(range(max_env_steps), desc=f"Grasping: {node}"): + action = torch.Tensor(action_tensor[step]).to(env.unwrapped.device) + env.unwrapped.agent.set_action(action) + obs, reward, terminated, truncated, info = env.step(action) + + env.close() + logger.info(f"Results saved in {cfg.output_dir}") + + +if __name__ == "__main__": + entrypoint() diff --git a/embodied_gen/scripts/simulate_sapien.py b/embodied_gen/scripts/simulate_sapien.py index 529b765..faa6fbf 100644 --- a/embodied_gen/scripts/simulate_sapien.py +++ b/embodied_gen/scripts/simulate_sapien.py @@ -35,6 +35,9 @@ from embodied_gen.utils.simulation import ( SIM_COORD_ALIGN, FrankaPandaGrasper, SapienSceneManager, + load_assets_from_layout_file, + load_mani_skill_robot, + render_images, ) @@ -50,8 +53,8 @@ class SapienSimConfig: default_factory=lambda: [0.7071, 0, 0, 0.7071] ) # xyzw device: str = "cuda" - robot_name: str | None = None - control_freq: int = 40 + control_freq: int = 50 + insert_robot: bool = False # Camera settings. render_interval: int = 10 num_cameras: int = 3 @@ -91,48 +94,41 @@ def entrypoint(**kwargs): layout_data = json.load(f) layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data) - scene_manager.load_assets_from_layout_file( + actors = load_assets_from_layout_file( + scene_manager.scene, layout_data, cfg.z_offset, cfg.init_quat, - cfg.robot_name, - cfg.control_freq, + ) + agent = load_mani_skill_robot( + scene_manager.scene, layout_data, cfg.control_freq ) frames = defaultdict(list) image_cnt = 0 for step in tqdm(range(cfg.sim_step), desc="Simulation"): scene_manager.scene.step() - if cfg.robot_name is not None: - scene_manager.robot.reset(scene_manager.robot.init_qpos) + agent.reset(agent.init_qpos) if step % cfg.render_interval != 0: continue scene_manager.scene.update_render() image_cnt += 1 for camera in scene_manager.cameras: camera.take_picture() - images = scene_manager.render_images( - camera, render_keys=cfg.render_keys - ) + images = render_images(camera, cfg.render_keys) frames[camera.name].append(images) - grasp_frames = defaultdict(dict) - if cfg.robot_name is not None: - sim_steps_per_control = cfg.sim_freq // cfg.control_freq - control_timestep = 1.0 / cfg.control_freq + actions = dict() + if cfg.insert_robot: grasper = FrankaPandaGrasper( - scene_manager.robot, - scene_manager, - sim_steps_per_control, - control_timestep, + agent, + cfg.control_freq, ) for node in layout_data.relation[ Scene3DItemEnum.MANIPULATED_OBJS.value ]: - grasp_frames[node] = grasper.render_grasp( - scene_manager.actors[node], - scene_manager.cameras, - cfg.render_keys, + actions[node] = grasper.compute_grasp_action( + actor=actors[node], reach_target_only=True ) if "Foreground" not in cfg.render_keys: @@ -160,22 +156,36 @@ def entrypoint(**kwargs): bg_images[camera.name] = result.rgb[..., ::-1] video_frames = [] - for camera in scene_manager.cameras: + for idx, camera in enumerate(scene_manager.cameras): # Scene rendering - for step in tqdm(range(image_cnt), desc="Rendering"): - rgba = alpha_blend_rgba( - frames[camera.name][step]["Foreground"], - bg_images[camera.name], - ) - video_frames.append(np.array(rgba)) - # Grasp rendering - for node in grasp_frames: - for frame in grasp_frames[node][camera.name]: + if idx == 0: + for step in range(image_cnt): rgba = alpha_blend_rgba( - frame["Foreground"], bg_images[camera.name] + frames[camera.name][step]["Foreground"], + bg_images[camera.name], ) video_frames.append(np.array(rgba)) + # Grasp rendering + for node in actions: + if actions[node] is None: + continue + for action in tqdm(actions[node]): + grasp_frames = scene_manager.step_action( + agent, + torch.Tensor(action[None, ...]), + scene_manager.cameras, + cfg.render_keys, + sim_steps_per_control=cfg.sim_freq // cfg.control_freq, + ) + rgba = alpha_blend_rgba( + grasp_frames[camera.name][0]["Foreground"], + bg_images[camera.name], + ) + video_frames.append(np.array(rgba)) + + agent.reset(agent.init_qpos) + os.makedirs(cfg.output_dir, exist_ok=True) video_path = f"{cfg.output_dir}/Iscene.mp4" imageio.mimsave(video_path, video_frames, fps=30) diff --git a/embodied_gen/scripts/textto3d.py b/embodied_gen/scripts/textto3d.py index 03a9d2f..64475e3 100644 --- a/embodied_gen/scripts/textto3d.py +++ b/embodied_gen/scripts/textto3d.py @@ -42,7 +42,7 @@ from embodied_gen.validators.quality_checkers import ( os.environ["TOKENIZERS_PARALLELISM"] = "false" random.seed(0) -logger.info("Loading Models...") +logger.info("Loading TEXT2IMG_MODEL...") SEMANTIC_CHECKER = SemanticConsistChecker(GPT_CLIENT) SEG_CHECKER = ImageSegChecker(GPT_CLIENT) TXTGEN_CHECKER = TextGenAlignChecker(GPT_CLIENT) @@ -170,6 +170,7 @@ def text_to_3d(**kwargs) -> dict: seed=random.randint(0, 100000) if seed_3d is None else seed_3d, n_retry=args.n_asset_retry, keep_intermediate=args.keep_intermediate, + disable_decompose_convex=args.disable_decompose_convex, ) mesh_path = f"{node_save_dir}/result/mesh/{save_node}.obj" image_path = render_asset3d( @@ -270,6 +271,7 @@ def parse_args(): help="Random seed for 3D generation", ) parser.add_argument("--keep_intermediate", action="store_true") + parser.add_argument("--disable_decompose_convex", action="store_true") args, unknown = parser.parse_known_args() diff --git a/embodied_gen/utils/monkey_patches.py b/embodied_gen/utils/monkey_patches.py index 79e77d5..81076cd 100644 --- a/embodied_gen/utils/monkey_patches.py +++ b/embodied_gen/utils/monkey_patches.py @@ -18,6 +18,7 @@ import os import sys import zipfile +import numpy as np import torch from huggingface_hub import hf_hub_download from omegaconf import OmegaConf @@ -150,3 +151,68 @@ def monkey_patch_pano2room(): self.inpaint_pipe = pipe SDFTInpainter.__init__ = patched_sd_inpaint_init + + +def monkey_patch_maniskill(): + from mani_skill.envs.scene import ManiSkillScene + + def get_sensor_images( + self, obs: dict[str, any] + ) -> dict[str, dict[str, torch.Tensor]]: + sensor_data = dict() + for name, sensor in self.sensors.items(): + sensor_data[name] = sensor.get_images(obs[name]) + return sensor_data + + def get_human_render_camera_images( + self, camera_name: str = None, return_alpha: bool = False + ) -> dict[str, torch.Tensor]: + def get_rgba_tensor(camera, return_alpha): + color = camera.get_obs( + rgb=True, depth=False, segmentation=False, position=False + )["rgb"] + if return_alpha: + seg_labels = camera.get_obs( + rgb=False, depth=False, segmentation=True, position=False + )["segmentation"] + masks = np.where((seg_labels.cpu() > 0), 255, 0).astype( + np.uint8 + ) + masks = torch.tensor(masks).to(color.device) + color = torch.concat([color, masks], dim=-1) + + return color + + image_data = dict() + if self.gpu_sim_enabled: + if self.parallel_in_single_scene: + for name, camera in self.human_render_cameras.items(): + camera.camera._render_cameras[0].take_picture() + rgba = get_rgba_tensor(camera, return_alpha) + image_data[name] = rgba + else: + for name, camera in self.human_render_cameras.items(): + if camera_name is not None and name != camera_name: + continue + assert camera.config.shader_config.shader_pack not in [ + "rt", + "rt-fast", + "rt-med", + ], "ray tracing shaders do not work with parallel rendering" + camera.capture() + rgba = get_rgba_tensor(camera, return_alpha) + image_data[name] = rgba + else: + for name, camera in self.human_render_cameras.items(): + if camera_name is not None and name != camera_name: + continue + camera.capture() + rgba = get_rgba_tensor(camera, return_alpha) + image_data[name] = rgba + + return image_data + + ManiSkillScene.get_sensor_images = get_sensor_images + ManiSkillScene.get_human_render_camera_images = ( + get_human_render_camera_images + ) diff --git a/embodied_gen/utils/simulation.py b/embodied_gen/utils/simulation.py index b858a3c..36cf4c8 100644 --- a/embodied_gen/utils/simulation.py +++ b/embodied_gen/utils/simulation.py @@ -14,7 +14,7 @@ # implied. See the License for the specific language governing # permissions and limitations under the License. - +import json import logging import os import xml.etree.ElementTree as ET @@ -24,19 +24,20 @@ from typing import Literal import mplib import numpy as np import sapien.core as sapien +import sapien.physx as physx import torch -from PIL import Image, ImageColor -from scipy.spatial.transform import Rotation as R -from tqdm import tqdm -from embodied_gen.data.utils import DiffrastRender -from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum -from embodied_gen.utils.geometry import quaternion_multiply - -logger = logging.getLogger(__name__) +from mani_skill.agents.base_agent import BaseAgent +from mani_skill.envs.scene import ManiSkillScene from mani_skill.examples.motionplanning.panda.utils import ( compute_grasp_info_by_obb, ) from mani_skill.utils.geometry.trimesh_utils import get_component_mesh +from PIL import Image, ImageColor +from scipy.spatial.transform import Rotation as R +from embodied_gen.data.utils import DiffrastRender +from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum +from embodied_gen.utils.geometry import quaternion_multiply +from embodied_gen.utils.log import logger COLORMAP = list(set(ImageColor.colormap.values())) COLOR_PALETTE = np.array( @@ -54,10 +55,267 @@ SIM_COORD_ALIGN = np.array( __all__ = [ "SIM_COORD_ALIGN", "FrankaPandaGrasper", - "SimpleGrasper", + "load_assets_from_layout_file", + "load_mani_skill_robot", + "render_images", ] +def load_actor_from_urdf( + scene: ManiSkillScene | sapien.Scene, + file_path: str, + pose: sapien.Pose, + env_idx: int = None, + use_static: bool = False, + update_mass: bool = False, +) -> sapien.pysapien.Entity: + tree = ET.parse(file_path) + root = tree.getroot() + node_name = root.get("name") + file_dir = os.path.dirname(file_path) + + visual_mesh = root.find('.//visual/geometry/mesh') + visual_file = visual_mesh.get("filename") + visual_scale = visual_mesh.get("scale", "1.0 1.0 1.0") + visual_scale = np.array([float(x) for x in visual_scale.split()]) + + collision_mesh = root.find('.//collision/geometry/mesh') + collision_file = collision_mesh.get("filename") + collision_scale = collision_mesh.get("scale", "1.0 1.0 1.0") + collision_scale = np.array([float(x) for x in collision_scale.split()]) + + visual_file = os.path.join(file_dir, visual_file) + collision_file = os.path.join(file_dir, collision_file) + static_fric = root.find('.//collision/gazebo/mu1').text + dynamic_fric = root.find('.//collision/gazebo/mu2').text + + material = physx.PhysxMaterial( + static_friction=np.clip(float(static_fric), 0.1, 0.7), + dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6), + restitution=0.05, + ) + builder = scene.create_actor_builder() + + body_type = "static" if use_static else "dynamic" + builder.set_physx_body_type(body_type) + builder.add_multiple_convex_collisions_from_file( + collision_file if body_type == "dynamic" else visual_file, + material=material, + scale=collision_scale if body_type == "dynamic" else visual_scale, + # decomposition="coacd", + # decomposition_params=dict( + # threshold=0.05, max_convex_hull=64, verbose=False + # ), + ) + + builder.add_visual_from_file(visual_file, scale=visual_scale) + builder.set_initial_pose(pose) + if isinstance(scene, ManiSkillScene) and env_idx is not None: + builder.set_scene_idxs([env_idx]) + + actor = builder.build(name=f"{node_name}-{env_idx}") + + if update_mass and hasattr(actor.components[1], "mass"): + node_mass = float(root.find('.//inertial/mass').get("value")) + actor.components[1].set_mass(node_mass) + + return actor + + +def load_assets_from_layout_file( + scene: ManiSkillScene | sapien.Scene, + layout: LayoutInfo | str, + z_offset: float = 0.0, + init_quat: list[float] = [0, 0, 0, 1], + env_idx: int = None, +) -> dict[str, sapien.pysapien.Entity]: + """Load assets from `EmbodiedGen` layout-gen output and create actors in the scene. + + Args: + scene (sapien.Scene | ManiSkillScene): The SAPIEN or ManiSkill scene to load assets into. + layout (LayoutInfo): The layout information data. + z_offset (float): Offset to apply to the Z-coordinate of non-context objects. + init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment. + env_idx (int): Environment index for multi-environment setup. + """ + if isinstance(layout, str) and layout.endswith(".json"): + layout = LayoutInfo.from_dict(json.load(open(layout, "r"))) + + actors = dict() + for node in layout.assets: + file_dir = layout.assets[node] + file_name = f"{node.replace(' ', '_')}.urdf" + urdf_file = os.path.join(file_dir, file_name) + + if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value: + continue + + position = layout.position[node].copy() + if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value: + position[2] += z_offset + + use_static = ( + layout.relation.get(Scene3DItemEnum.CONTEXT.value, None) == node + ) + + # Combine initial quaternion with object quaternion + x, y, z, qx, qy, qz, qw = position + qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat) + actor = load_actor_from_urdf( + scene, + urdf_file, + sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]), + env_idx, + use_static=use_static, + update_mass=False, + ) + actors[node] = actor + + return actors + + +def load_mani_skill_robot( + scene: sapien.Scene | ManiSkillScene, + layout: LayoutInfo | str, + control_freq: int = 20, + robot_init_qpos_noise: float = 0.0, + control_mode: str = "pd_joint_pos", + backend_str: tuple[str, str] = ("cpu", "gpu"), +) -> BaseAgent: + from mani_skill.agents import REGISTERED_AGENTS + from mani_skill.envs.scene import ManiSkillScene + from mani_skill.envs.utils.system.backend import ( + parse_sim_and_render_backend, + ) + + if isinstance(layout, str) and layout.endswith(".json"): + layout = LayoutInfo.from_dict(json.load(open(layout, "r"))) + + robot_name = layout.relation[Scene3DItemEnum.ROBOT.value] + x, y, z, qx, qy, qz, qw = layout.position[robot_name] + delta_z = 0.002 # Add small offset to avoid collision. + pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz]) + + if robot_name not in REGISTERED_AGENTS: + logger.warning( + f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}, use `panda` instead." + ) + robot_name = "panda" + + ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls + backend = parse_sim_and_render_backend(*backend_str) + if isinstance(scene, sapien.Scene): + scene = ManiSkillScene([scene], device=backend_str[0], backend=backend) + robot = ROBOT_CLS( + scene=scene, + control_freq=control_freq, + control_mode=control_mode, + initial_pose=pose, + ) + + # Set robot init joint rad agree(joint0 to joint6 w 2 finger). + qpos = np.array( + [ + 0.0, + np.pi / 8, + 0, + -np.pi * 3 / 8, + 0, + np.pi * 3 / 4, + np.pi / 4, + 0.04, + 0.04, + ] + ) + qpos = ( + np.random.normal( + 0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos)) + ) + + qpos + ) + qpos[:, -2:] = 0.04 + robot.reset(qpos) + robot.init_qpos = robot.robot.qpos + robot.controller.controllers["gripper"].reset() + + return robot + + +def render_images( + camera: sapien.render.RenderCameraComponent, + render_keys: list[ + Literal[ + "Color", + "Segmentation", + "Normal", + "Mask", + "Depth", + "Foreground", + ] + ] = None, +) -> dict[str, Image.Image]: + """Render images from a given sapien camera. + + Args: + camera (sapien.render.RenderCameraComponent): The camera to render from. + render_keys (List[str]): Types of images to render (e.g., Color, Segmentation). + + Returns: + Dict[str, Image.Image]: Dictionary of rendered images. + """ + if render_keys is None: + render_keys = [ + "Color", + "Segmentation", + "Normal", + "Mask", + "Depth", + "Foreground", + ] + + results: dict[str, Image.Image] = {} + if "Color" in render_keys: + color = camera.get_picture("Color") + color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8) + results["Color"] = Image.fromarray(color_rgb) + + if "Mask" in render_keys: + alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8) + results["Mask"] = Image.fromarray(alpha) + + if "Segmentation" in render_keys: + seg_labels = camera.get_picture("Segmentation") + label0 = seg_labels[..., 0].astype(np.uint8) + seg_color = COLOR_PALETTE[label0] + results["Segmentation"] = Image.fromarray(seg_color) + + if "Foreground" in render_keys: + seg_labels = camera.get_picture("Segmentation") + label0 = seg_labels[..., 0] + mask = np.where((label0 > 1), 255, 0).astype(np.uint8) + color = camera.get_picture("Color") + color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8) + foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1) + results["Foreground"] = Image.fromarray(foreground) + + if "Normal" in render_keys: + normal = camera.get_picture("Normal")[..., :3] + normal_img = (((normal + 1) / 2) * 255).astype(np.uint8) + results["Normal"] = Image.fromarray(normal_img) + + if "Depth" in render_keys: + position_map = camera.get_picture("Position") + depth = -position_map[..., 2] + alpha = torch.tensor(color[..., 3], dtype=torch.float32) + norm_depth = DiffrastRender.normalize_map_by_mask( + torch.tensor(depth), alpha + ) + depth_img = (norm_depth * 255).to(torch.uint8).numpy() + results["Depth"] = Image.fromarray(depth_img) + + return results + + class SapienSceneManager: """A class to manage SAPIEN simulator.""" @@ -81,11 +339,6 @@ class SapienSceneManager: sapien.render.set_ray_tracing_path_depth(10) sapien.render.set_ray_tracing_denoiser("oidn") - # TODO: Support GPU. - # import sapien.physx as physx - # physx.enable_gpu() - # scene = sapien.Scene(systems=[sapien.physx.PhysxGpuSystem(), sapien.render.RenderSystem()]) - scene = sapien.Scene() scene.set_timestep(1 / self.sim_freq) @@ -114,12 +367,13 @@ class SapienSceneManager: def step_action( self, + agent: BaseAgent, action: torch.Tensor, - sim_steps_per_control: int, cameras: list[sapien.render.RenderCameraComponent], render_keys: list[str], + sim_steps_per_control: int = 1, ) -> dict: - self.robot.set_action(action) + agent.set_action(action) frames = defaultdict(list) for _ in range(sim_steps_per_control): self.scene.step() @@ -127,125 +381,11 @@ class SapienSceneManager: self.scene.update_render() for camera in cameras: camera.take_picture() - images = self.render_images(camera, render_keys=render_keys) + images = render_images(camera, render_keys=render_keys) frames[camera.name].append(images) return frames - def load_exists_robot( - self, - robot_name: str, - pose: sapien.Pose, - control_freq: int = 20, - robot_init_qpos_noise: float = 0.02, - control_mode: str = "pd_joint_pos", - backend_str: tuple[str, str] = ("cpu", "gpu"), - ): - from mani_skill.agents import REGISTERED_AGENTS - from mani_skill.envs.scene import ManiSkillScene - from mani_skill.envs.utils.system.backend import ( - parse_sim_and_render_backend, - ) - - if robot_name == "franka": - robot_name = "panda" # remapping - if robot_name not in REGISTERED_AGENTS: - raise KeyError( - f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}." - ) - - ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls - # backend = parse_sim_and_render_backend("physx_cuda", "gpu") - backend = parse_sim_and_render_backend(*backend_str) - scene = ManiSkillScene( - [self.scene], device=backend_str[0], backend=backend - ) - robot = ROBOT_CLS( - scene=scene, - control_freq=control_freq, - control_mode=control_mode, - initial_pose=pose, - ) - # Set robot init joint rad agree(joint0 to joint6 w 2 finger). - qpos = np.array( - [ - 0.0, - np.pi / 8, - 0, - -np.pi * 3 / 8, - 0, - np.pi * 3 / 4, - np.pi / 4, - 0.04, - 0.04, - ] - ) - qpos = ( - np.random.normal( - 0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos)) - ) - + qpos - ) - qpos[:, -2:] = 0.04 - robot.reset(qpos) - robot.init_qpos = robot.robot.qpos - robot.controller.controllers["gripper"].reset() - - return robot - - def load_actor_from_urdf( - self, - file_path: str, - pose: sapien.Pose, - use_static: bool = False, - update_mass: bool = False, - ) -> sapien.pysapien.Entity: - """Load an actor from a URDF file into the scene. - - Args: - file_path (str): Path to the URDF file. - pose (sapien.Pose): Pose of the actor. - use_static (bool): Whether to create a static actor. - update_mass (bool): Whether to update the actor's mass from the URDF. - - Returns: - sapien.pysapien.Entity: The loaded actor. - """ - tree = ET.parse(file_path) - root = tree.getroot() - node_name = root.get("name") - file_dir = os.path.dirname(file_path) - visual_file = root.find('.//visual/geometry/mesh').get("filename") - collision_file = root.find('.//collision/geometry/mesh').get( - "filename" - ) - visual_file = os.path.join(file_dir, visual_file) - collision_file = os.path.join(file_dir, collision_file) - static_fric = root.find('.//collision/gazebo/mu1').text - dynamic_fric = root.find('.//collision/gazebo/mu2').text - - material = self.scene.create_physical_material( - static_friction=np.clip(float(static_fric), 0.1, 0.7), - dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6), - restitution=0.05, - ) - builder = self.scene.create_actor_builder() - body_type = "static" if use_static else "dynamic" - builder.set_physx_body_type(body_type) - builder.add_multiple_convex_collisions_from_file( - collision_file, material=material - ) - builder.add_visual_from_file(visual_file) - actor = builder.build(name=node_name) - actor.set_name(node_name) - actor.set_pose(pose) - - if update_mass and hasattr(actor.components[1], "mass"): - node_mass = float(root.find('.//inertial/mass').get("value")) - actor.components[1].set_mass(node_mass) - - return actor - def create_camera( self, cam_name: str, @@ -344,257 +484,78 @@ class SapienSceneManager: return self.cameras - def render_images( - self, - camera: sapien.render.RenderCameraComponent, - render_keys: list[ - Literal[ - "Color", - "Segmentation", - "Normal", - "Mask", - "Depth", - "Foreground", - ] - ] = None, - ) -> dict[str, Image.Image]: - """Render images from a given camera. - - Args: - camera (sapien.render.RenderCameraComponent): The camera to render from. - render_keys (List[str]): Types of images to render (e.g., Color, Segmentation). - - Returns: - Dict[str, Image.Image]: Dictionary of rendered images. - """ - if render_keys is None: - render_keys = [ - "Color", - "Segmentation", - "Normal", - "Mask", - "Depth", - "Foreground", - ] - - results: dict[str, Image.Image] = {} - if "Color" in render_keys: - color = camera.get_picture("Color") - color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8) - results["Color"] = Image.fromarray(color_rgb) - - if "Mask" in render_keys: - alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8) - results["Mask"] = Image.fromarray(alpha) - - if "Segmentation" in render_keys: - seg_labels = camera.get_picture("Segmentation") - label0 = seg_labels[..., 0].astype(np.uint8) - seg_color = self.COLOR_PALETTE[label0] - results["Segmentation"] = Image.fromarray(seg_color) - - if "Foreground" in render_keys: - seg_labels = camera.get_picture("Segmentation") - label0 = seg_labels[..., 0] - mask = np.where((label0 > 1), 255, 0).astype(np.uint8) - color = camera.get_picture("Color") - color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8) - foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1) - results["Foreground"] = Image.fromarray(foreground) - - if "Normal" in render_keys: - normal = camera.get_picture("Normal")[..., :3] - normal_img = (((normal + 1) / 2) * 255).astype(np.uint8) - results["Normal"] = Image.fromarray(normal_img) - - if "Depth" in render_keys: - position_map = camera.get_picture("Position") - depth = -position_map[..., 2] - alpha = torch.tensor(color[..., 3], dtype=torch.float32) - norm_depth = DiffrastRender.normalize_map_by_mask( - torch.tensor(depth), alpha - ) - depth_img = (norm_depth * 255).to(torch.uint8).numpy() - results["Depth"] = Image.fromarray(depth_img) - - return results - - def load_assets_from_layout_file( - self, - layout: LayoutInfo, - z_offset: float = 0.0, - init_quat: list[float] = [0, 0, 0, 1], - robot_name: str = None, - control_freq: int = 40, - robot_init_qpos_noise: float = 0, - ) -> None: - """Load assets from `EmbodiedGen` layout-gen output and create actors in the scene. - - Args: - layout (LayoutInfo): The layout information data. - z_offset (float): Offset to apply to the Z-coordinate of non-context objects. - init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment. - robot_name (str): Name of the robot to load, None not load robot. - control_freq (int): Control frequency for the robot, default is 40. - robot_init_qpos_noise (float): Noise to add to the robot's initial joint positions, default is 0. - """ - for node in layout.assets: - file_dir = layout.assets[node] - file_name = f"{node.replace(' ', '_')}.urdf" - urdf_file = os.path.join(file_dir, file_name) - - if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value: - continue - - position = layout.position[node].copy() - if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value: - position[2] += z_offset - - use_static = ( - layout.relation.get(Scene3DItemEnum.CONTEXT.value, None) - == node - ) - - # Combine initial quaternion with object quaternion - x, y, z, qx, qy, qz, qw = position - qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat) - actor = self.load_actor_from_urdf( - urdf_file, - sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]), - use_static=use_static, - update_mass=False, - ) - self.actors[node] = actor - - if robot_name is not None: - robot_node = layout.relation[Scene3DItemEnum.ROBOT.value] - x, y, z, qx, qy, qz, qw = layout.position[robot_node] - delta_z = 0.002 # Add small offset to avoid collision. - pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz]) - self.robot = self.load_exists_robot( - robot_name, pose, control_freq, robot_init_qpos_noise - ) - - return - class FrankaPandaGrasper(object): - """Grasper planner for Franka Panda robot arm. - - Args: - robot: See from `mani_skill.agents.robots.panda.panda`. - scene: The `SapienSceneManager` object. - sim_steps_per_control: Number of simulation steps per control step. - control_timestep: Time step for planning. - joint_vel_limits: Velocity limit for all joints (default: 0.9) - joint_acc_limits: Acceleration limit for all joints (default: 0.9) - finger_length: Length of the robot's fingers (default: 0.025) - """ - def __init__( self, - robot, - scene: SapienSceneManager, - sim_steps_per_control: int, - control_timestep: float, - joint_vel_limits: float = 0.9, - joint_acc_limits: float = 0.9, + agent: BaseAgent, + control_freq: float, + joint_vel_limits: float = 2.0, + joint_acc_limits: float = 1.0, finger_length: float = 0.025, ) -> None: - self.agent = robot - self.robot = robot.robot - self.scene = scene - self.sim_steps_per_control = sim_steps_per_control - self.control_timestep = control_timestep + self.agent = agent + self.robot = agent.robot + self.control_freq = control_freq + self.control_timestep = 1 / control_freq self.joint_vel_limits = joint_vel_limits self.joint_acc_limits = joint_acc_limits self.finger_length = finger_length - self.planner = self._setup_planner() + self.planners = self._setup_planner() def _setup_planner(self) -> mplib.Planner: - link_names = [link.get_name() for link in self.robot.get_links()] - joint_names = [ - joint.get_name() for joint in self.robot.get_active_joints() - ] - - planner = mplib.Planner( - urdf=self.agent.urdf_path, - srdf=self.agent.urdf_path.replace(".urdf", ".srdf"), - user_link_names=link_names, - user_joint_names=joint_names, - move_group="panda_hand_tcp", - joint_vel_limits=np.ones(7) * self.joint_vel_limits, - joint_acc_limits=np.ones(7) * self.joint_acc_limits, - ) - planner.set_base_pose(self.robot.pose.raw_pose[0].tolist()) - - return planner - - def update_state( - self, - result: dict, - gripper_state: Literal[-1, 1], - sim_steps_per_control: int, - cameras: list[sapien.render.RenderCameraComponent], - render_keys: list[str], - ) -> dict[str, list[Image.Image]]: - n_step = len(result["position"]) - total_frames = defaultdict(list) - for i in tqdm(range(n_step), desc="Grasping"): - qpos = result["position"][min(i, n_step - 1)] - action = np.hstack([qpos, gripper_state])[None, ...] - action = torch.from_numpy(action).float() - frames = self.scene.step_action( - action, sim_steps_per_control, cameras, render_keys + planners = [] + for pose in self.robot.pose: + link_names = [link.get_name() for link in self.robot.get_links()] + joint_names = [ + joint.get_name() for joint in self.robot.get_active_joints() + ] + planner = mplib.Planner( + urdf=self.agent.urdf_path, + srdf=self.agent.urdf_path.replace(".urdf", ".srdf"), + user_link_names=link_names, + user_joint_names=joint_names, + move_group="panda_hand_tcp", + joint_vel_limits=np.ones(7) * self.joint_vel_limits, + joint_acc_limits=np.ones(7) * self.joint_acc_limits, ) - for camera in cameras: - total_frames[camera.name].extend(frames[camera.name]) + planner.set_base_pose(pose.raw_pose[0].tolist()) + planners.append(planner) - return total_frames + return planners def control_gripper( self, - sim_steps_per_control: int, - cameras: list[sapien.render.RenderCameraComponent], - render_keys: list[str], gripper_state: Literal[-1, 1], n_step: int = 10, - ) -> dict[str, list[Image.Image]]: - total_frames = defaultdict(list) + ) -> np.ndarray: qpos = self.robot.get_qpos()[0, :-2].cpu().numpy() + actions = [] for _ in range(n_step): action = np.hstack([qpos, gripper_state])[None, ...] - action = torch.from_numpy(action).float() - frames = self.scene.step_action( - action, sim_steps_per_control, cameras, render_keys - ) - for camera in cameras: - total_frames[camera.name].extend(frames[camera.name]) + actions.append(action) - return total_frames + return np.concatenate(actions, axis=0) def move_to_pose( self, pose: sapien.Pose, - sim_steps_per_control: int, control_timestep: float, - cameras: list[sapien.render.RenderCameraComponent], - render_keys: list[str], gripper_state: Literal[-1, 1], use_point_cloud: bool = False, n_max_step: int = 100, - ) -> dict[str, list[Image.Image]]: - # First try screw motion planning - result = self.planner.plan_screw( + action_key: str = "position", + env_idx: int = 0, + ) -> np.ndarray: + result = self.planners[env_idx].plan_qpos_to_pose( np.concatenate([pose.p, pose.q]), self.robot.get_qpos().cpu().numpy()[0], time_step=control_timestep, use_point_cloud=use_point_cloud, ) - # Fallback to standard planning if screw fails if result["status"] != "Success": - result = self.planner.plan_qpos_to_pose( + result = self.planners[env_idx].plan_screw( np.concatenate([pose.p, pose.q]), self.robot.get_qpos().cpu().numpy()[0], time_step=control_timestep, @@ -602,44 +563,34 @@ class FrankaPandaGrasper(object): ) if result["status"] != "Success": - return defaultdict(list) + return - for key in result: - if key in ["status", "duration"]: - continue - sample_ratio = (len(result[key]) // n_max_step) + 1 - result[key] = result[key][::sample_ratio] + sample_ratio = (len(result[action_key]) // n_max_step) + 1 + result[action_key] = result[action_key][::sample_ratio] - frames = self.update_state( - result, gripper_state, sim_steps_per_control, cameras, render_keys - ) + n_step = len(result[action_key]) + actions = [] + for i in range(n_step): + qpos = result[action_key][i] + action = np.hstack([qpos, gripper_state])[None, ...] + actions.append(action) - return frames + return np.concatenate(actions, axis=0) - def render_grasp( + def compute_grasp_action( self, actor: sapien.pysapien.Entity, - cameras: list[sapien.render.RenderCameraComponent], - render_keys: list[ - Literal[ - "Color", - "Foreground", - "Segmentation", - "Normal", - "Mask", - "Depth", - ] - ], reach_target_only: bool = True, - ) -> dict[str, list[Image.Image]]: + offset: tuple[float, float, float] = [0, 0, -0.05], + env_idx: int = 0, + ) -> np.ndarray: physx_rigid = actor.components[1] mesh = get_component_mesh(physx_rigid, to_world_frame=True) obb = mesh.bounding_box_oriented approaching = np.array([0, 0, -1]) + tcp_pose = self.agent.tcp.pose[env_idx] target_closing = ( - self.agent.tcp.pose.to_transformation_matrix()[0, :3, 1] - .cpu() - .numpy() + tcp_pose.to_transformation_matrix()[0, :3, 1].cpu().numpy() ) grasp_info = compute_grasp_info_by_obb( obb, @@ -647,60 +598,46 @@ class FrankaPandaGrasper(object): target_closing=target_closing, depth=self.finger_length, ) - closing, center = grasp_info["closing"], grasp_info["center"] - raw_tcp_pose = self.agent.tcp.pose.sp - grasp_pose = self.agent.build_grasp_pose(approaching, closing, center) - reach_pose = grasp_pose * sapien.Pose(p=[0, 0, -0.05]) - grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01]) - total_frames = defaultdict(list) - frames_list = [] - reach_frames = self.move_to_pose( + closing, center = grasp_info["closing"], grasp_info["center"] + raw_tcp_pose = tcp_pose.sp + grasp_pose = self.agent.build_grasp_pose(approaching, closing, center) + reach_pose = grasp_pose * sapien.Pose(p=offset) + grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01]) + actions = [] + reach_actions = self.move_to_pose( reach_pose, - self.sim_steps_per_control, self.control_timestep, - cameras, - render_keys, gripper_state=1, + env_idx=env_idx, ) - frames_list.append(reach_frames) - if len(reach_frames) == 0: + actions.append(reach_actions) + + if reach_actions is None: logger.warning( f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping." ) - return total_frames + return None if not reach_target_only: - grasp_frames = self.move_to_pose( + grasp_actions = self.move_to_pose( grasp_pose, - self.sim_steps_per_control, self.control_timestep, - cameras, - render_keys, gripper_state=1, + env_idx=env_idx, ) - frames_list.append(grasp_frames) - close_frames = self.control_gripper( - self.sim_steps_per_control, - cameras, - render_keys, + actions.append(grasp_actions) + close_actions = self.control_gripper( gripper_state=-1, + env_idx=env_idx, ) - frames_list.append(close_frames) - back_frames = self.move_to_pose( + actions.append(close_actions) + back_actions = self.move_to_pose( raw_tcp_pose, - self.sim_steps_per_control, self.control_timestep, - cameras, - render_keys, gripper_state=-1, + env_idx=env_idx, ) - frames_list.append(back_frames) + actions.append(back_actions) - for frame_dict in frames_list: - for cam_name, frames in frame_dict.items(): - total_frames[cam_name].extend(frames) - - self.agent.reset(self.agent.init_qpos) - - return total_frames + return np.concatenate(actions, axis=0) diff --git a/embodied_gen/utils/tags.py b/embodied_gen/utils/tags.py index 35b2387..4e6a7d5 100644 --- a/embodied_gen/utils/tags.py +++ b/embodied_gen/utils/tags.py @@ -1 +1 @@ -VERSION = "v0.1.3" +VERSION = "v0.1.4" diff --git a/embodied_gen/validators/urdf_convertor.py b/embodied_gen/validators/urdf_convertor.py index 8940ed8..0fa9461 100644 --- a/embodied_gen/validators/urdf_convertor.py +++ b/embodied_gen/validators/urdf_convertor.py @@ -24,6 +24,7 @@ from xml.dom.minidom import parseString import numpy as np import trimesh +from embodied_gen.data.convex_decomposer import decompose_convex_mesh from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient from embodied_gen.utils.process_media import render_asset3d from embodied_gen.utils.tags import VERSION @@ -84,6 +85,7 @@ class URDFGenerator(object): attrs_name: list[str] = None, render_dir: str = "urdf_renders", render_view_num: int = 4, + decompose_convex: bool = False, ) -> None: if mesh_file_list is None: mesh_file_list = [] @@ -132,8 +134,9 @@ class URDFGenerator(object): Estimate the vertical projection of their real length based on its pose. For example: - A pen standing upright in the first image (aligned with the image's vertical axis) - full body visible in the first image: → vertical height ≈ 0.14-0.20 m - - A pen lying flat in the first image (showing thickness or as a dot) → vertical height ≈ 0.018-0.025 m + full body visible in the first image: → vertical height ≈ 0.14-0.20 m + - A pen lying flat in the first image or either the tip or the tail is facing the image + (showing thickness or as a circle) → vertical height ≈ 0.018-0.025 m - Tilted pen in the first image (e.g., ~45° angle): vertical height ≈ 0.07-0.12 m - Use the rest views to help determine the object's 3D pose and orientation. Assume the object is in real-world scale and estimate the approximate vertical height @@ -156,6 +159,7 @@ class URDFGenerator(object): "gs_model", ] self.attrs_name = attrs_name + self.decompose_convex = decompose_convex def parse_response(self, response: str) -> dict[str, any]: lines = response.split("\n") @@ -258,9 +262,24 @@ class URDFGenerator(object): # Update collision geometry collision = link.find("collision/geometry/mesh") if collision is not None: - collision.set( - "filename", os.path.join(self.output_mesh_dir, obj_name) - ) + collision_mesh = os.path.join(self.output_mesh_dir, obj_name) + if self.decompose_convex: + try: + d_params = dict( + threshold=0.05, max_convex_hull=64, verbose=False + ) + filename = f"{os.path.splitext(obj_name)[0]}_collision.ply" + output_path = os.path.join(mesh_folder, filename) + decompose_convex_mesh( + mesh_output_path, output_path, **d_params + ) + collision_mesh = f"{self.output_mesh_dir}/{filename}" + except Exception as e: + logger.warning( + f"Convex decomposition failed for {output_path}, {e}." + "Use original mesh for collision computation." + ) + collision.set("filename", collision_mesh) collision.set("scale", "1.0 1.0 1.0") # Update friction coefficients diff --git a/install/install_basic.sh b/install/install_basic.sh index dc90446..a81c836 100644 --- a/install/install_basic.sh +++ b/install/install_basic.sh @@ -9,10 +9,10 @@ PIP_INSTALL_PACKAGES=( "xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118" "flash-attn==2.7.0.post2 --no-build-isolation" "-r requirements.txt --use-deprecated=legacy-resolver" - "utils3d@git+https://github.com/EasternJournalist/utils3d.git#egg=9a4eb15" + "utils3d@git+https://github.com/EasternJournalist/utils3d.git@9a4eb15" "clip@git+https://github.com/openai/CLIP.git" - "segment-anything@git+https://github.com/facebookresearch/segment-anything.git#egg=dca509f" - "nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git#egg=729261d" + "segment-anything@git+https://github.com/facebookresearch/segment-anything.git@dca509f" + "nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git@729261d" "kolors@git+https://github.com/HochCC/Kolors.git" "kaolin@git+https://github.com/NVIDIAGameWorks/kaolin.git@v0.16.0" "git+https://github.com/nerfstudio-project/gsplat.git@v1.5.3" diff --git a/pyproject.toml b/pyproject.toml index cf14b78..12b7da5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,7 +7,7 @@ packages = ["embodied_gen"] [project] name = "embodied_gen" -version = "v0.1.3" +version = "v0.1.4" readme = "README.md" license = "Apache-2.0" license-files = ["LICENSE", "NOTICE"] diff --git a/requirements.txt b/requirements.txt index 3fa2137..71507fe 100644 --- a/requirements.txt +++ b/requirements.txt @@ -38,5 +38,6 @@ tyro pyquaternion shapely sapien==3.0.0b1 +coacd mani_skill==3.0.0b21 typing_extensions==4.14.1 diff --git a/tests/test_examples/test_urdf_convertor.py b/tests/test_examples/test_urdf_convertor.py index 10f4dab..3795e3b 100644 --- a/tests/test_examples/test_urdf_convertor.py +++ b/tests/test_examples/test_urdf_convertor.py @@ -39,6 +39,7 @@ def test_urdf_convertor(): "outputs/test_urdf/notebook/result/mesh/notebook.obj", "outputs/test_urdf/marker/result/mesh/marker.obj", "outputs/test_urdf/pen2/result/mesh/pen.obj", + "outputs/test_urdf/pen3/result/mesh/pen.obj", ] for idx, mesh_path in enumerate(mesh_paths): filename = mesh_path.split("/")[-1].split(".")[0] @@ -49,3 +50,19 @@ def test_urdf_convertor(): # min_height=1.0, # max_height=1.2, ) + + +def test_decompose_convex_mesh(): + urdf_gen = URDFGenerator(GPT_CLIENT, decompose_convex=True) + mesh_paths = [ + "outputs/test_urdf/sample_0/mesh/pen.obj", + "outputs/test_urdf/sample_1/mesh/notepad.obj", + "outputs/test_urdf/sample_2/mesh/plate.obj", + ] + for idx, mesh_path in enumerate(mesh_paths): + filename = mesh_path.split("/")[-1].split(".")[0] + urdf_path = urdf_gen( + mesh_path=mesh_path, + output_root=f"outputs/test_urdf3/sample_{idx}", + category=filename, + ) diff --git a/tests/test_unit/test_mesh_process.py b/tests/test_unit/test_mesh_process.py new file mode 100644 index 0000000..7a9f982 --- /dev/null +++ b/tests/test_unit/test_mesh_process.py @@ -0,0 +1,44 @@ +# Project EmbodiedGen +# +# Copyright (c) 2025 Horizon Robotics. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or +# implied. See the License for the specific language governing +# permissions and limitations under the License. + + +import logging +import tempfile +from time import time + +import pytest +from embodied_gen.data.convex_decomposer import decompose_convex_mesh + +logger = logging.getLogger(__name__) + + +@pytest.mark.parametrize( + "input_mesh_path, max_convex_hull", + [ + ("apps/assets/example_texture/meshes/robot_text.obj", 8), + # ("apps/assets/example_texture/meshes/robot_text.obj", 32), + # ("apps/assets/example_texture/meshes/robot_text.obj", 64), + # ("apps/assets/example_texture/meshes/robot_text.obj", 128), + ], +) +def test_decompose_convex_mesh(input_mesh_path, max_convex_hull): + d_params = dict( + threshold=0.05, max_convex_hull=max_convex_hull, verbose=False + ) + with tempfile.NamedTemporaryFile(suffix='.ply', delete=True) as tmp_file: + start_time = time() + decompose_convex_mesh(input_mesh_path, tmp_file.name, **d_params) + logger.info(f"Finished in {time()-start_time:.2f}s")