feat(sim): Optimize support for downstream simulators and gym. (#36)
* feat(sim): Optimize support for downstream simulators and gym. * feat(sim): Optimize support for downstream simulators and gym. * docs: update docs * update version
This commit is contained in:
parent
e06900fe57
commit
0ca4a4894f
@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0).
|
||||
|
||||
## [0.1.4] - 2025-08-29
|
||||
### Feature
|
||||
- Supports importing the output of the EmbodiedGen Layout Generation module with a single `gym.make` command to create multiple parallel simulation environments and interactive 3D worlds.
|
||||
- Supports coacd-based mesh decomposition for better efficiency and accuracy in physical simulations.
|
||||
|
||||
## [0.1.3] - 2025-08-19
|
||||
### Feature
|
||||
- Release `layout-cli` 🏞️ Layout(Interactive 3D Worlds) Generation, generate interactive 3D scene from task description.
|
||||
|
||||
51
README.md
51
README.md
@ -30,7 +30,7 @@
|
||||
```sh
|
||||
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
|
||||
cd EmbodiedGen
|
||||
git checkout v0.1.3
|
||||
git checkout v0.1.4
|
||||
git submodule update --init --recursive --progress
|
||||
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
|
||||
conda activate embodiedgen
|
||||
@ -67,7 +67,8 @@ You can choose between two backends for the GPT agent:
|
||||
|
||||
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems.
|
||||
(HF space is a simplified demonstration. For the full functionality, please refer to `img3d-cli`.)
|
||||
<img src="apps/assets/image_to_3d.jpg" alt="Image to 3D" width="900">
|
||||
|
||||
<img src="apps/assets/image_to_3d.jpg" alt="Image to 3D" width="700">
|
||||
|
||||
### ☁️ Service
|
||||
Run the image-to-3D generation service locally.
|
||||
@ -95,7 +96,7 @@ img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/examp
|
||||
|
||||
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to `text3d-cli`.)
|
||||
|
||||
<img src="apps/assets/text_to_3d.jpg" alt="Text to 3D" width="900">
|
||||
<img src="apps/assets/text_to_3d.jpg" alt="Text to 3D" width="700">
|
||||
|
||||
### ☁️ Service
|
||||
Deploy the text-to-3D generation service locally.
|
||||
@ -133,7 +134,7 @@ ps: models with more permissive licenses found in `embodied_gen/models/image_com
|
||||
|
||||
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh.
|
||||
|
||||
<img src="apps/assets/texture_gen.jpg" alt="Texture Gen" width="900">
|
||||
<img src="apps/assets/texture_gen.jpg" alt="Texture Gen" width="700">
|
||||
|
||||
|
||||
### ☁️ Service
|
||||
@ -161,7 +162,7 @@ bash embodied_gen/scripts/texture_gen.sh \
|
||||
|
||||
<h2 id="3d-scene-generation">🌍 3D Scene Generation</h2>
|
||||
|
||||
<img src="apps/assets/scene3d.gif" alt="scene3d" style="width: 640px;">
|
||||
<img src="apps/assets/scene3d.gif" alt="scene3d" style="width: 600px;">
|
||||
|
||||
### ⚡ API
|
||||
> Run `bash install.sh extra` to install additional requirements if you need to use `scene3d-cli`.
|
||||
@ -184,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
|
||||
|
||||
🚧 *Coming Soon*
|
||||
|
||||
<img src="apps/assets/articulate.gif" alt="articulate" style="width: 480px;">
|
||||
<img src="apps/assets/articulate.gif" alt="articulate" style="width: 430px;">
|
||||
|
||||
|
||||
---
|
||||
@ -196,12 +197,12 @@ CUDA_VISIBLE_DEVICES=0 scene3d-cli \
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td><img src="apps/assets/layout1.gif" alt="layout1" width="360"/></td>
|
||||
<td><img src="apps/assets/layout2.gif" alt="layout2" width="360"/></td>
|
||||
<td><img src="apps/assets/layout1.gif" alt="layout1" width="320"/></td>
|
||||
<td><img src="apps/assets/layout2.gif" alt="layout2" width="320"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><img src="apps/assets/layout3.gif" alt="layout3" width="360"/></td>
|
||||
<td><img src="apps/assets/layout4.gif" alt="layout4" width="360"/></td>
|
||||
<td><img src="apps/assets/layout3.gif" alt="layout3" width="320"/></td>
|
||||
<td><img src="apps/assets/layout4.gif" alt="layout4" width="320"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
@ -219,8 +220,8 @@ layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td><img src="apps/assets/Iscene_demo1.gif" alt="Iscene_demo1" width="300"/></td>
|
||||
<td><img src="apps/assets/Iscene_demo2.gif" alt="Iscene_demo2" width="450"/></td>
|
||||
<td><img src="apps/assets/Iscene_demo1.gif" alt="Iscene_demo1" width="234"/></td>
|
||||
<td><img src="apps/assets/Iscene_demo2.gif" alt="Iscene_demo2" width="350"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
@ -233,7 +234,7 @@ CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
|
||||
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &
|
||||
```
|
||||
|
||||
Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.)
|
||||
Using `compose_layout.py`, you can recompose the layout of the generated interactive 3D scenes.
|
||||
```sh
|
||||
python embodied_gen/scripts/compose_layout.py \
|
||||
--layout_path "outputs/layouts_gens/task_0000/layout.json" \
|
||||
@ -243,13 +244,29 @@ python embodied_gen/scripts/compose_layout.py \
|
||||
We provide `sim-cli`, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).
|
||||
|
||||
```sh
|
||||
sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \
|
||||
--output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka"
|
||||
sim-cli --layout_path "outputs/layouts_gen/task_0000/layout.json" \
|
||||
--output_dir "outputs/layouts_gen/task_0000/sapien_render" --insert_robot
|
||||
```
|
||||
|
||||
Example: generate multiple parallel simulation envs with `gym.make` and record sensor and trajectory data.
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<td><img src="apps/assets/parallel_sim.gif" alt="parallel_sim1" width="290"/></td>
|
||||
<td><img src="apps/assets/parallel_sim2.gif" alt="parallel_sim2" width="290"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
```sh
|
||||
python embodied_gen/scripts/parallel_sim.py \
|
||||
--layout_file "outputs/layouts_gen/task_0000/layout.json" \
|
||||
--output_dir "outputs/parallel_sim/task_0000" \
|
||||
--num_envs 20
|
||||
```
|
||||
|
||||
### 🖼️ Real-to-Sim Digital Twin
|
||||
|
||||
<img src="apps/assets/real2sim_mujoco.gif" alt="real2sim_mujoco" style="width: 512px;">
|
||||
<img src="apps/assets/real2sim_mujoco.gif" alt="real2sim_mujoco" width="400">
|
||||
|
||||
---
|
||||
|
||||
@ -280,7 +297,7 @@ If you use EmbodiedGen in your research or projects, please cite:
|
||||
## 🙌 Acknowledgement
|
||||
|
||||
EmbodiedGen builds upon the following amazing projects and models:
|
||||
🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
|
||||
🌟 [Trellis](https://github.com/microsoft/TRELLIS) | 🌟 [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | 🌟 [Segment Anything](https://github.com/facebookresearch/segment-anything) | 🌟 [Rembg](https://github.com/danielgatis/rembg) | 🌟 [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | 🌟 [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | 🌟 [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | 🌟 [Kolors](https://github.com/Kwai-Kolors/Kolors) | 🌟 [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 🌟 [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | 🌟 [Pano2Room](https://github.com/TrickyGo/Pano2Room) | 🌟 [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | 🌟 [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | 🌟 [diffusers](https://github.com/huggingface/diffusers) | 🌟 [gsplat](https://github.com/nerfstudio-project/gsplat) | 🌟 [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | 🌟 [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | 🌟 [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 🌟 [ManiSkill](https://github.com/haosulab/ManiSkill)
|
||||
|
||||
---
|
||||
|
||||
|
||||
BIN
apps/assets/parallel_sim.gif
Normal file
BIN
apps/assets/parallel_sim.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 897 KiB |
BIN
apps/assets/parallel_sim2.gif
Normal file
BIN
apps/assets/parallel_sim2.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 711 KiB |
@ -547,7 +547,9 @@ def extract_urdf(
|
||||
|
||||
# Convert to URDF and recover attrs by GPT.
|
||||
filename = "sample"
|
||||
urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4)
|
||||
urdf_convertor = URDFGenerator(
|
||||
GPT_CLIENT, render_view_num=4, decompose_convex=True
|
||||
)
|
||||
asset_attrs = {
|
||||
"version": VERSION,
|
||||
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",
|
||||
|
||||
161
embodied_gen/data/convex_decomposer.py
Normal file
161
embodied_gen/data/convex_decomposer.py
Normal file
@ -0,0 +1,161 @@
|
||||
# Project EmbodiedGen
|
||||
#
|
||||
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied. See the License for the specific language governing
|
||||
# permissions and limitations under the License.
|
||||
|
||||
import logging
|
||||
import multiprocessing as mp
|
||||
import os
|
||||
|
||||
import coacd
|
||||
import trimesh
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
__all__ = [
|
||||
"decompose_convex_coacd",
|
||||
"decompose_convex_mesh",
|
||||
"decompose_convex_process",
|
||||
]
|
||||
|
||||
|
||||
def decompose_convex_coacd(
|
||||
filename: str, outfile: str, params: dict, verbose: bool = False
|
||||
) -> None:
|
||||
coacd.set_log_level("info" if verbose else "warn")
|
||||
|
||||
mesh = trimesh.load(filename, force="mesh")
|
||||
mesh = coacd.Mesh(mesh.vertices, mesh.faces)
|
||||
|
||||
result = coacd.run_coacd(mesh, **params)
|
||||
combined = sum([trimesh.Trimesh(*m) for m in result])
|
||||
combined.export(outfile)
|
||||
|
||||
|
||||
def decompose_convex_mesh(
|
||||
filename: str,
|
||||
outfile: str,
|
||||
threshold: float = 0.05,
|
||||
max_convex_hull: int = -1,
|
||||
preprocess_mode: str = "auto",
|
||||
preprocess_resolution: int = 30,
|
||||
resolution: int = 2000,
|
||||
mcts_nodes: int = 20,
|
||||
mcts_iterations: int = 150,
|
||||
mcts_max_depth: int = 3,
|
||||
pca: bool = False,
|
||||
merge: bool = True,
|
||||
seed: int = 0,
|
||||
verbose: bool = False,
|
||||
) -> str:
|
||||
"""Decompose a mesh into convex parts using the CoACD algorithm."""
|
||||
coacd.set_log_level("info" if verbose else "warn")
|
||||
|
||||
if os.path.exists(outfile):
|
||||
logger.warning(f"Output file {outfile} already exists, removing it.")
|
||||
os.remove(outfile)
|
||||
|
||||
params = dict(
|
||||
threshold=threshold,
|
||||
max_convex_hull=max_convex_hull,
|
||||
preprocess_mode=preprocess_mode,
|
||||
preprocess_resolution=preprocess_resolution,
|
||||
resolution=resolution,
|
||||
mcts_nodes=mcts_nodes,
|
||||
mcts_iterations=mcts_iterations,
|
||||
mcts_max_depth=mcts_max_depth,
|
||||
pca=pca,
|
||||
merge=merge,
|
||||
seed=seed,
|
||||
)
|
||||
|
||||
try:
|
||||
decompose_convex_coacd(filename, outfile, params, verbose)
|
||||
if os.path.exists(outfile):
|
||||
return outfile
|
||||
except Exception as e:
|
||||
if verbose:
|
||||
print(f"Decompose convex first attempt failed: {e}.")
|
||||
|
||||
if preprocess_mode != "on":
|
||||
try:
|
||||
params["preprocess_mode"] = "on"
|
||||
decompose_convex_coacd(filename, outfile, params, verbose)
|
||||
if os.path.exists(outfile):
|
||||
return outfile
|
||||
except Exception as e:
|
||||
if verbose:
|
||||
print(
|
||||
f"Decompose convex second attempt with preprocess_mode='on' failed: {e}"
|
||||
)
|
||||
|
||||
raise RuntimeError(f"Convex decomposition failed on {filename}")
|
||||
|
||||
|
||||
def decompose_convex_mp(
|
||||
filename: str,
|
||||
outfile: str,
|
||||
threshold: float = 0.05,
|
||||
max_convex_hull: int = -1,
|
||||
preprocess_mode: str = "auto",
|
||||
preprocess_resolution: int = 30,
|
||||
resolution: int = 2000,
|
||||
mcts_nodes: int = 20,
|
||||
mcts_iterations: int = 150,
|
||||
mcts_max_depth: int = 3,
|
||||
pca: bool = False,
|
||||
merge: bool = True,
|
||||
seed: int = 0,
|
||||
verbose: bool = False,
|
||||
) -> str:
|
||||
"""Decompose a mesh into convex parts using the CoACD algorithm in a separate process.
|
||||
|
||||
See https://simulately.wiki/docs/toolkits/ConvexDecomp for details.
|
||||
"""
|
||||
params = dict(
|
||||
threshold=threshold,
|
||||
max_convex_hull=max_convex_hull,
|
||||
preprocess_mode=preprocess_mode,
|
||||
preprocess_resolution=preprocess_resolution,
|
||||
resolution=resolution,
|
||||
mcts_nodes=mcts_nodes,
|
||||
mcts_iterations=mcts_iterations,
|
||||
mcts_max_depth=mcts_max_depth,
|
||||
pca=pca,
|
||||
merge=merge,
|
||||
seed=seed,
|
||||
)
|
||||
|
||||
ctx = mp.get_context("spawn")
|
||||
p = ctx.Process(
|
||||
target=decompose_convex_coacd,
|
||||
args=(filename, outfile, params, verbose),
|
||||
)
|
||||
p.start()
|
||||
p.join()
|
||||
if p.exitcode == 0 and os.path.exists(outfile):
|
||||
return outfile
|
||||
|
||||
if preprocess_mode != "on":
|
||||
params["preprocess_mode"] = "on"
|
||||
p = ctx.Process(
|
||||
target=decompose_convex_coacd,
|
||||
args=(filename, outfile, params, verbose),
|
||||
)
|
||||
p.start()
|
||||
p.join()
|
||||
if p.exitcode == 0 and os.path.exists(outfile):
|
||||
return outfile
|
||||
|
||||
raise RuntimeError(f"Convex decomposition failed on {filename}")
|
||||
@ -16,13 +16,17 @@
|
||||
|
||||
|
||||
import logging
|
||||
import multiprocessing as mp
|
||||
import os
|
||||
from typing import Tuple, Union
|
||||
|
||||
import coacd
|
||||
import igraph
|
||||
import numpy as np
|
||||
import pyvista as pv
|
||||
import spaces
|
||||
import torch
|
||||
import trimesh
|
||||
import utils3d
|
||||
from pymeshfix import _meshfix
|
||||
from tqdm import tqdm
|
||||
@ -33,7 +37,9 @@ logging.basicConfig(
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
__all__ = ["MeshFixer"]
|
||||
__all__ = [
|
||||
"MeshFixer",
|
||||
]
|
||||
|
||||
|
||||
def _radical_inverse(base, n):
|
||||
|
||||
389
embodied_gen/envs/pick_embodiedgen.py
Normal file
389
embodied_gen/envs/pick_embodiedgen.py
Normal file
@ -0,0 +1,389 @@
|
||||
# Project EmbodiedGen
|
||||
#
|
||||
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied. See the License for the specific language governing
|
||||
# permissions and limitations under the License.
|
||||
|
||||
import json
|
||||
import os
|
||||
from copy import deepcopy
|
||||
|
||||
import numpy as np
|
||||
import sapien
|
||||
import torch
|
||||
import torchvision.transforms as transforms
|
||||
from mani_skill.envs.sapien_env import BaseEnv
|
||||
from mani_skill.sensors.camera import CameraConfig
|
||||
from mani_skill.utils import sapien_utils
|
||||
from mani_skill.utils.building import actors
|
||||
from mani_skill.utils.registration import register_env
|
||||
from mani_skill.utils.structs.actor import Actor
|
||||
from mani_skill.utils.structs.pose import Pose
|
||||
from mani_skill.utils.structs.types import (
|
||||
GPUMemoryConfig,
|
||||
SceneConfig,
|
||||
SimConfig,
|
||||
)
|
||||
from mani_skill.utils.visualization.misc import tile_images
|
||||
from tqdm import tqdm
|
||||
from embodied_gen.models.gs_model import GaussianOperator
|
||||
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||
from embodied_gen.utils.geometry import bfs_placement, quaternion_multiply
|
||||
from embodied_gen.utils.log import logger
|
||||
from embodied_gen.utils.process_media import alpha_blend_rgba
|
||||
from embodied_gen.utils.simulation import (
|
||||
SIM_COORD_ALIGN,
|
||||
load_assets_from_layout_file,
|
||||
)
|
||||
|
||||
__all__ = ["PickEmbodiedGen"]
|
||||
|
||||
|
||||
@register_env("PickEmbodiedGen-v1", max_episode_steps=100)
|
||||
class PickEmbodiedGen(BaseEnv):
|
||||
SUPPORTED_ROBOTS = ["panda", "panda_wristcam", "fetch"]
|
||||
goal_thresh = 0.0
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*args,
|
||||
robot_uids: str | list[str] = "panda",
|
||||
robot_init_qpos_noise: float = 0.02,
|
||||
num_envs: int = 1,
|
||||
reconfiguration_freq: int = None,
|
||||
**kwargs,
|
||||
):
|
||||
self.robot_init_qpos_noise = robot_init_qpos_noise
|
||||
if reconfiguration_freq is None:
|
||||
if num_envs == 1:
|
||||
reconfiguration_freq = 1
|
||||
else:
|
||||
reconfiguration_freq = 0
|
||||
|
||||
# Init params from kwargs.
|
||||
layout_file = kwargs.pop("layout_file", None)
|
||||
replace_objs = kwargs.pop("replace_objs", True)
|
||||
self.enable_grasp = kwargs.pop("enable_grasp", False)
|
||||
self.init_quat = kwargs.pop("init_quat", [0.7071, 0, 0, 0.7071])
|
||||
# Add small offset in z-axis to avoid collision.
|
||||
self.objs_z_offset = kwargs.pop("objs_z_offset", 0.002)
|
||||
self.robot_z_offset = kwargs.pop("robot_z_offset", 0.002)
|
||||
|
||||
self.layouts = self.init_env_layouts(
|
||||
layout_file, num_envs, replace_objs
|
||||
)
|
||||
self.robot_pose = self.compute_robot_init_pose(
|
||||
self.layouts, num_envs, self.robot_z_offset
|
||||
)
|
||||
self.env_actors = dict()
|
||||
self.image_transform = transforms.PILToTensor()
|
||||
|
||||
super().__init__(
|
||||
*args,
|
||||
robot_uids=robot_uids,
|
||||
reconfiguration_freq=reconfiguration_freq,
|
||||
num_envs=num_envs,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
self.bg_images = dict()
|
||||
if self.render_mode == "hybrid":
|
||||
self.bg_images = self.render_gs3d_images(
|
||||
self.layouts, num_envs, self.init_quat
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def init_env_layouts(
|
||||
layout_file: str, num_envs: int, replace_objs: bool
|
||||
) -> list[LayoutInfo]:
|
||||
layout = LayoutInfo.from_dict(json.load(open(layout_file, "r")))
|
||||
layouts = []
|
||||
for env_idx in range(num_envs):
|
||||
if replace_objs and env_idx > 0:
|
||||
layout = bfs_placement(deepcopy(layout))
|
||||
layouts.append(layout)
|
||||
|
||||
return layouts
|
||||
|
||||
@staticmethod
|
||||
def compute_robot_init_pose(
|
||||
layouts: list[LayoutInfo], num_envs: int, z_offset: float = 0.0
|
||||
) -> list[list[float]]:
|
||||
robot_pose = []
|
||||
for env_idx in range(num_envs):
|
||||
layout = layouts[env_idx]
|
||||
robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
|
||||
x, y, z, qx, qy, qz, qw = layout.position[robot_node]
|
||||
robot_pose.append([x, y, z + z_offset, qw, qx, qy, qz])
|
||||
|
||||
return robot_pose
|
||||
|
||||
@property
|
||||
def _default_sim_config(self):
|
||||
return SimConfig(
|
||||
scene_config=SceneConfig(
|
||||
solver_position_iterations=30,
|
||||
# contact_offset=0.04,
|
||||
# rest_offset=0.001,
|
||||
),
|
||||
# sim_freq=200,
|
||||
control_freq=50,
|
||||
gpu_memory_config=GPUMemoryConfig(
|
||||
max_rigid_contact_count=2**20, max_rigid_patch_count=2**19
|
||||
),
|
||||
)
|
||||
|
||||
@property
|
||||
def _default_sensor_configs(self):
|
||||
pose = sapien_utils.look_at(eye=[0.3, 0, 0.6], target=[-0.1, 0, 0.1])
|
||||
|
||||
return [
|
||||
CameraConfig("base_camera", pose, 128, 128, np.pi / 2, 0.01, 100)
|
||||
]
|
||||
|
||||
@property
|
||||
def _default_human_render_camera_configs(self):
|
||||
pose = sapien_utils.look_at(
|
||||
eye=[0.9, 0.0, 1.1], target=[0.0, 0.0, 0.9]
|
||||
)
|
||||
|
||||
return CameraConfig(
|
||||
"render_camera", pose, 256, 256, np.deg2rad(75), 0.01, 100
|
||||
)
|
||||
|
||||
def _load_agent(self, options: dict):
|
||||
super()._load_agent(options, sapien.Pose(p=[-10, 0, 10]))
|
||||
|
||||
def _load_scene(self, options: dict):
|
||||
all_objects = []
|
||||
logger.info(f"Loading assets and decomposition mesh collisions...")
|
||||
for env_idx in range(self.num_envs):
|
||||
env_actors = load_assets_from_layout_file(
|
||||
self.scene,
|
||||
self.layouts[env_idx],
|
||||
z_offset=self.objs_z_offset,
|
||||
init_quat=self.init_quat,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
self.env_actors[f"env{env_idx}"] = env_actors
|
||||
all_objects.extend(env_actors.values())
|
||||
|
||||
self.obj = all_objects[-1]
|
||||
for obj in all_objects:
|
||||
self.remove_from_state_dict_registry(obj)
|
||||
|
||||
self.all_objects = Actor.merge(all_objects, name="all_objects")
|
||||
self.add_to_state_dict_registry(self.all_objects)
|
||||
|
||||
self.goal_site = actors.build_sphere(
|
||||
self.scene,
|
||||
radius=self.goal_thresh,
|
||||
color=[0, 1, 0, 0],
|
||||
name="goal_site",
|
||||
body_type="kinematic",
|
||||
add_collision=False,
|
||||
initial_pose=sapien.Pose(),
|
||||
)
|
||||
self._hidden_objects.append(self.goal_site)
|
||||
|
||||
def _initialize_episode(self, env_idx: torch.Tensor, options: dict):
|
||||
with torch.device(self.device):
|
||||
b = len(env_idx)
|
||||
goal_xyz = torch.zeros((b, 3))
|
||||
goal_xyz[:, :2] = torch.rand((b, 2)) * 0.2 - 0.1
|
||||
self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))
|
||||
|
||||
qpos = np.array(
|
||||
[
|
||||
0.0,
|
||||
np.pi / 8,
|
||||
0,
|
||||
-np.pi * 3 / 8,
|
||||
0,
|
||||
np.pi * 3 / 4,
|
||||
np.pi / 4,
|
||||
0.04,
|
||||
0.04,
|
||||
]
|
||||
)
|
||||
qpos = (
|
||||
np.random.normal(
|
||||
0, self.robot_init_qpos_noise, (self.num_envs, len(qpos))
|
||||
)
|
||||
+ qpos
|
||||
)
|
||||
qpos[:, -2:] = 0.04
|
||||
self.agent.robot.set_root_pose(np.array(self.robot_pose))
|
||||
self.agent.reset(qpos)
|
||||
self.agent.init_qpos = qpos
|
||||
self.agent.controller.controllers["gripper"].reset()
|
||||
|
||||
def render_gs3d_images(
|
||||
self, layouts: list[LayoutInfo], num_envs: int, init_quat: list[float]
|
||||
) -> dict[str, np.ndarray]:
|
||||
sim_coord_align = (
|
||||
torch.tensor(SIM_COORD_ALIGN).to(torch.float32).to(self.device)
|
||||
)
|
||||
cameras = self.scene.sensors.copy()
|
||||
cameras.update(self.scene.human_render_cameras)
|
||||
|
||||
bg_node = layouts[0].relation[Scene3DItemEnum.BACKGROUND.value]
|
||||
gs_path = os.path.join(layouts[0].assets[bg_node], "gs_model.ply")
|
||||
raw_gs: GaussianOperator = GaussianOperator.load_from_ply(gs_path)
|
||||
bg_images = dict()
|
||||
for env_idx in tqdm(range(num_envs), desc="Pre-rendering Background"):
|
||||
layout = layouts[env_idx]
|
||||
x, y, z, qx, qy, qz, qw = layout.position[bg_node]
|
||||
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
|
||||
init_pose = torch.tensor([x, y, z, qx, qy, qz, qw])
|
||||
gs_model = raw_gs.get_gaussians(instance_pose=init_pose)
|
||||
for key in cameras:
|
||||
camera = cameras[key]
|
||||
Ks = camera.camera.get_intrinsic_matrix() # (n_env, 3, 3)
|
||||
c2w = camera.camera.get_model_matrix() # (n_env, 4, 4)
|
||||
result = gs_model.render(
|
||||
c2w[env_idx] @ sim_coord_align,
|
||||
Ks[env_idx],
|
||||
image_width=camera.config.width,
|
||||
image_height=camera.config.height,
|
||||
)
|
||||
bg_images[f"{key}-env{env_idx}"] = result.rgb[..., ::-1]
|
||||
|
||||
return bg_images
|
||||
|
||||
def render(self):
|
||||
if self.render_mode is None:
|
||||
raise RuntimeError("render_mode is not set.")
|
||||
if self.render_mode == "human":
|
||||
return self.render_human()
|
||||
elif self.render_mode == "rgb_array":
|
||||
res = self.render_rgb_array()
|
||||
return res
|
||||
elif self.render_mode == "sensors":
|
||||
res = self.render_sensors()
|
||||
return res
|
||||
elif self.render_mode == "all":
|
||||
return self.render_all()
|
||||
elif self.render_mode == "hybrid":
|
||||
return self.hybrid_render()
|
||||
else:
|
||||
raise NotImplementedError(
|
||||
f"Unsupported render mode {self.render_mode}."
|
||||
)
|
||||
|
||||
def render_rgb_array(
|
||||
self, camera_name: str = None, return_alpha: bool = False
|
||||
):
|
||||
for obj in self._hidden_objects:
|
||||
obj.show_visual()
|
||||
self.scene.update_render(
|
||||
update_sensors=False, update_human_render_cameras=True
|
||||
)
|
||||
images = []
|
||||
render_images = self.scene.get_human_render_camera_images(
|
||||
camera_name, return_alpha
|
||||
)
|
||||
for image in render_images.values():
|
||||
images.append(image)
|
||||
if len(images) == 0:
|
||||
return None
|
||||
if len(images) == 1:
|
||||
return images[0]
|
||||
for obj in self._hidden_objects:
|
||||
obj.hide_visual()
|
||||
return tile_images(images)
|
||||
|
||||
def render_sensors(self):
|
||||
images = []
|
||||
sensor_images = self.get_sensor_images()
|
||||
for image in sensor_images.values():
|
||||
for img in image.values():
|
||||
images.append(img)
|
||||
return tile_images(images)
|
||||
|
||||
def hybrid_render(self):
|
||||
fg_images = self.render_rgb_array(
|
||||
return_alpha=True
|
||||
) # (n_env, h, w, 3)
|
||||
images = []
|
||||
for key in self.bg_images:
|
||||
if "render_camera" not in key:
|
||||
continue
|
||||
env_idx = int(key.split("-env")[-1])
|
||||
rgba = alpha_blend_rgba(
|
||||
fg_images[env_idx].cpu().numpy(), self.bg_images[key]
|
||||
)
|
||||
images.append(self.image_transform(rgba))
|
||||
|
||||
images = torch.stack(images, dim=0)
|
||||
images = images.permute(0, 2, 3, 1)
|
||||
|
||||
return images[..., :3]
|
||||
|
||||
def evaluate(self):
|
||||
obj_to_goal_pos = (
|
||||
self.obj.pose.p
|
||||
) # self.goal_site.pose.p - self.obj.pose.p
|
||||
is_obj_placed = (
|
||||
torch.linalg.norm(obj_to_goal_pos, axis=1) <= self.goal_thresh
|
||||
)
|
||||
is_grasped = self.agent.is_grasping(self.obj)
|
||||
is_robot_static = self.agent.is_static(0.2)
|
||||
|
||||
return dict(
|
||||
is_grasped=is_grasped,
|
||||
obj_to_goal_pos=obj_to_goal_pos,
|
||||
is_obj_placed=is_obj_placed,
|
||||
is_robot_static=is_robot_static,
|
||||
is_grasping=self.agent.is_grasping(self.obj),
|
||||
success=torch.logical_and(is_obj_placed, is_robot_static),
|
||||
)
|
||||
|
||||
def _get_obs_extra(self, info: dict):
|
||||
|
||||
return dict()
|
||||
|
||||
def compute_dense_reward(self, obs: any, action: torch.Tensor, info: dict):
|
||||
tcp_to_obj_dist = torch.linalg.norm(
|
||||
self.obj.pose.p - self.agent.tcp.pose.p, axis=1
|
||||
)
|
||||
reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
|
||||
reward = reaching_reward
|
||||
|
||||
is_grasped = info["is_grasped"]
|
||||
reward += is_grasped
|
||||
|
||||
# obj_to_goal_dist = torch.linalg.norm(
|
||||
# self.goal_site.pose.p - self.obj.pose.p, axis=1
|
||||
# )
|
||||
obj_to_goal_dist = torch.linalg.norm(
|
||||
self.obj.pose.p - self.obj.pose.p, axis=1
|
||||
)
|
||||
place_reward = 1 - torch.tanh(5 * obj_to_goal_dist)
|
||||
reward += place_reward * is_grasped
|
||||
|
||||
reward += info["is_obj_placed"] * is_grasped
|
||||
|
||||
static_reward = 1 - torch.tanh(
|
||||
5
|
||||
* torch.linalg.norm(self.agent.robot.get_qvel()[..., :-2], axis=1)
|
||||
)
|
||||
reward += static_reward * info["is_obj_placed"] * is_grasped
|
||||
|
||||
reward[info["success"]] = 6
|
||||
return reward
|
||||
|
||||
def compute_normalized_dense_reward(
|
||||
self, obs: any, action: torch.Tensor, info: dict
|
||||
):
|
||||
return self.compute_dense_reward(obs=obs, action=action, info=info) / 6
|
||||
@ -137,7 +137,7 @@ def entrypoint() -> None:
|
||||
sim_cli(
|
||||
layout_path=layout_path,
|
||||
output_dir=output_root,
|
||||
robot_name="franka" if args.insert_robot else None,
|
||||
insert_robot=args.insert_robot,
|
||||
)
|
||||
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
@ -58,7 +58,7 @@ os.environ["GRADIO_ANALYTICS_ENABLED"] = "false"
|
||||
os.environ["SPCONV_ALGO"] = "native"
|
||||
random.seed(0)
|
||||
|
||||
logger.info("Loading Models...")
|
||||
logger.info("Loading Image3D Models...")
|
||||
DELIGHT = DelightingModel()
|
||||
IMAGESR_MODEL = ImageRealESRGAN(outscale=4)
|
||||
RBG_REMOVER = RembgRemover()
|
||||
@ -107,6 +107,7 @@ def parse_args():
|
||||
type=int,
|
||||
default=2,
|
||||
)
|
||||
parser.add_argument("--disable_decompose_convex", action="store_true")
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
return args
|
||||
@ -251,7 +252,11 @@ def entrypoint(**kwargs):
|
||||
mesh_glb_path = os.path.join(output_root, f"{filename}.glb")
|
||||
mesh.export(mesh_glb_path)
|
||||
|
||||
urdf_convertor = URDFGenerator(GPT_CLIENT, render_view_num=4)
|
||||
urdf_convertor = URDFGenerator(
|
||||
GPT_CLIENT,
|
||||
render_view_num=4,
|
||||
decompose_convex=not args.disable_decompose_convex,
|
||||
)
|
||||
asset_attrs = {
|
||||
"version": VERSION,
|
||||
"gs_model": f"{urdf_convertor.output_mesh_dir}/{filename}_gs.ply",
|
||||
|
||||
148
embodied_gen/scripts/parallel_sim.py
Normal file
148
embodied_gen/scripts/parallel_sim.py
Normal file
@ -0,0 +1,148 @@
|
||||
# Project EmbodiedGen
|
||||
#
|
||||
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied. See the License for the specific language governing
|
||||
# permissions and limitations under the License.
|
||||
|
||||
|
||||
from embodied_gen.utils.monkey_patches import monkey_patch_maniskill
|
||||
|
||||
monkey_patch_maniskill()
|
||||
import json
|
||||
from collections import defaultdict
|
||||
from dataclasses import dataclass
|
||||
from typing import Literal
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import torch
|
||||
import tyro
|
||||
from mani_skill.utils.wrappers import RecordEpisode
|
||||
from tqdm import tqdm
|
||||
import embodied_gen.envs.pick_embodiedgen
|
||||
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||
from embodied_gen.utils.log import logger
|
||||
from embodied_gen.utils.simulation import FrankaPandaGrasper
|
||||
|
||||
|
||||
@dataclass
|
||||
class ParallelSimConfig:
|
||||
"""CLI parameters for Parallel Sapien simulation."""
|
||||
|
||||
# Environment configuration
|
||||
layout_file: str
|
||||
"""Path to the layout JSON file"""
|
||||
output_dir: str
|
||||
"""Directory to save recorded videos"""
|
||||
gym_env_name: str = "PickEmbodiedGen-v1"
|
||||
"""Name of the Gym environment to use"""
|
||||
num_envs: int = 4
|
||||
"""Number of parallel environments"""
|
||||
render_mode: Literal["rgb_array", "hybrid"] = "hybrid"
|
||||
"""Rendering mode: rgb_array or hybrid"""
|
||||
enable_shadow: bool = True
|
||||
"""Whether to enable shadows in rendering"""
|
||||
control_mode: str = "pd_joint_pos"
|
||||
"""Control mode for the agent"""
|
||||
|
||||
# Recording configuration
|
||||
max_steps_per_video: int = 1000
|
||||
"""Maximum steps to record per video"""
|
||||
save_trajectory: bool = False
|
||||
"""Whether to save trajectory data"""
|
||||
|
||||
# Simulation parameters
|
||||
seed: int = 0
|
||||
"""Random seed for environment reset"""
|
||||
warmup_steps: int = 50
|
||||
"""Number of warmup steps before action computation"""
|
||||
reach_target_only: bool = True
|
||||
"""Whether to only reach target without full action"""
|
||||
|
||||
|
||||
def entrypoint(**kwargs):
|
||||
if kwargs is None or len(kwargs) == 0:
|
||||
cfg = tyro.cli(ParallelSimConfig)
|
||||
else:
|
||||
cfg = ParallelSimConfig(**kwargs)
|
||||
|
||||
env = gym.make(
|
||||
cfg.gym_env_name,
|
||||
num_envs=cfg.num_envs,
|
||||
render_mode=cfg.render_mode,
|
||||
enable_shadow=cfg.enable_shadow,
|
||||
layout_file=cfg.layout_file,
|
||||
control_mode=cfg.control_mode,
|
||||
)
|
||||
env = RecordEpisode(
|
||||
env,
|
||||
cfg.output_dir,
|
||||
max_steps_per_video=cfg.max_steps_per_video,
|
||||
save_trajectory=cfg.save_trajectory,
|
||||
)
|
||||
env.reset(seed=cfg.seed)
|
||||
|
||||
default_action = env.unwrapped.agent.init_qpos[:, :8]
|
||||
for _ in tqdm(range(cfg.warmup_steps), desc="SIM Warmup"):
|
||||
# action = env.action_space.sample() # Random action
|
||||
obs, reward, terminated, truncated, info = env.step(default_action)
|
||||
|
||||
grasper = FrankaPandaGrasper(
|
||||
env.unwrapped.agent,
|
||||
env.unwrapped.sim_config.control_freq,
|
||||
)
|
||||
|
||||
layout_data = LayoutInfo.from_dict(json.load(open(cfg.layout_file, "r")))
|
||||
actions = defaultdict(list)
|
||||
# Plan Grasp reach pose for each manipulated object in each env.
|
||||
for env_idx in range(env.num_envs):
|
||||
actors = env.unwrapped.env_actors[f"env{env_idx}"]
|
||||
for node in layout_data.relation[
|
||||
Scene3DItemEnum.MANIPULATED_OBJS.value
|
||||
]:
|
||||
action = grasper.compute_grasp_action(
|
||||
actor=actors[node]._objs[0],
|
||||
reach_target_only=True,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
actions[node].append(action)
|
||||
|
||||
# Excute the planned actions for each manipulated object in each env.
|
||||
for node in actions:
|
||||
max_env_steps = 0
|
||||
for env_idx in range(env.num_envs):
|
||||
if actions[node][env_idx] is None:
|
||||
continue
|
||||
max_env_steps = max(max_env_steps, len(actions[node][env_idx]))
|
||||
|
||||
action_tensor = np.ones(
|
||||
(max_env_steps, env.num_envs, env.action_space.shape[-1])
|
||||
)
|
||||
action_tensor *= default_action[None, ...]
|
||||
for env_idx in range(env.num_envs):
|
||||
action = actions[node][env_idx]
|
||||
if action is None:
|
||||
continue
|
||||
action_tensor[: len(action), env_idx, :] = action
|
||||
|
||||
for step in tqdm(range(max_env_steps), desc=f"Grasping: {node}"):
|
||||
action = torch.Tensor(action_tensor[step]).to(env.unwrapped.device)
|
||||
env.unwrapped.agent.set_action(action)
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
|
||||
env.close()
|
||||
logger.info(f"Results saved in {cfg.output_dir}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
entrypoint()
|
||||
@ -35,6 +35,9 @@ from embodied_gen.utils.simulation import (
|
||||
SIM_COORD_ALIGN,
|
||||
FrankaPandaGrasper,
|
||||
SapienSceneManager,
|
||||
load_assets_from_layout_file,
|
||||
load_mani_skill_robot,
|
||||
render_images,
|
||||
)
|
||||
|
||||
|
||||
@ -50,8 +53,8 @@ class SapienSimConfig:
|
||||
default_factory=lambda: [0.7071, 0, 0, 0.7071]
|
||||
) # xyzw
|
||||
device: str = "cuda"
|
||||
robot_name: str | None = None
|
||||
control_freq: int = 40
|
||||
control_freq: int = 50
|
||||
insert_robot: bool = False
|
||||
# Camera settings.
|
||||
render_interval: int = 10
|
||||
num_cameras: int = 3
|
||||
@ -91,48 +94,41 @@ def entrypoint(**kwargs):
|
||||
layout_data = json.load(f)
|
||||
layout_data: LayoutInfo = LayoutInfo.from_dict(layout_data)
|
||||
|
||||
scene_manager.load_assets_from_layout_file(
|
||||
actors = load_assets_from_layout_file(
|
||||
scene_manager.scene,
|
||||
layout_data,
|
||||
cfg.z_offset,
|
||||
cfg.init_quat,
|
||||
cfg.robot_name,
|
||||
cfg.control_freq,
|
||||
)
|
||||
agent = load_mani_skill_robot(
|
||||
scene_manager.scene, layout_data, cfg.control_freq
|
||||
)
|
||||
|
||||
frames = defaultdict(list)
|
||||
image_cnt = 0
|
||||
for step in tqdm(range(cfg.sim_step), desc="Simulation"):
|
||||
scene_manager.scene.step()
|
||||
if cfg.robot_name is not None:
|
||||
scene_manager.robot.reset(scene_manager.robot.init_qpos)
|
||||
agent.reset(agent.init_qpos)
|
||||
if step % cfg.render_interval != 0:
|
||||
continue
|
||||
scene_manager.scene.update_render()
|
||||
image_cnt += 1
|
||||
for camera in scene_manager.cameras:
|
||||
camera.take_picture()
|
||||
images = scene_manager.render_images(
|
||||
camera, render_keys=cfg.render_keys
|
||||
)
|
||||
images = render_images(camera, cfg.render_keys)
|
||||
frames[camera.name].append(images)
|
||||
|
||||
grasp_frames = defaultdict(dict)
|
||||
if cfg.robot_name is not None:
|
||||
sim_steps_per_control = cfg.sim_freq // cfg.control_freq
|
||||
control_timestep = 1.0 / cfg.control_freq
|
||||
actions = dict()
|
||||
if cfg.insert_robot:
|
||||
grasper = FrankaPandaGrasper(
|
||||
scene_manager.robot,
|
||||
scene_manager,
|
||||
sim_steps_per_control,
|
||||
control_timestep,
|
||||
agent,
|
||||
cfg.control_freq,
|
||||
)
|
||||
for node in layout_data.relation[
|
||||
Scene3DItemEnum.MANIPULATED_OBJS.value
|
||||
]:
|
||||
grasp_frames[node] = grasper.render_grasp(
|
||||
scene_manager.actors[node],
|
||||
scene_manager.cameras,
|
||||
cfg.render_keys,
|
||||
actions[node] = grasper.compute_grasp_action(
|
||||
actor=actors[node], reach_target_only=True
|
||||
)
|
||||
|
||||
if "Foreground" not in cfg.render_keys:
|
||||
@ -160,22 +156,36 @@ def entrypoint(**kwargs):
|
||||
bg_images[camera.name] = result.rgb[..., ::-1]
|
||||
|
||||
video_frames = []
|
||||
for camera in scene_manager.cameras:
|
||||
for idx, camera in enumerate(scene_manager.cameras):
|
||||
# Scene rendering
|
||||
for step in tqdm(range(image_cnt), desc="Rendering"):
|
||||
rgba = alpha_blend_rgba(
|
||||
frames[camera.name][step]["Foreground"],
|
||||
bg_images[camera.name],
|
||||
)
|
||||
video_frames.append(np.array(rgba))
|
||||
# Grasp rendering
|
||||
for node in grasp_frames:
|
||||
for frame in grasp_frames[node][camera.name]:
|
||||
if idx == 0:
|
||||
for step in range(image_cnt):
|
||||
rgba = alpha_blend_rgba(
|
||||
frame["Foreground"], bg_images[camera.name]
|
||||
frames[camera.name][step]["Foreground"],
|
||||
bg_images[camera.name],
|
||||
)
|
||||
video_frames.append(np.array(rgba))
|
||||
|
||||
# Grasp rendering
|
||||
for node in actions:
|
||||
if actions[node] is None:
|
||||
continue
|
||||
for action in tqdm(actions[node]):
|
||||
grasp_frames = scene_manager.step_action(
|
||||
agent,
|
||||
torch.Tensor(action[None, ...]),
|
||||
scene_manager.cameras,
|
||||
cfg.render_keys,
|
||||
sim_steps_per_control=cfg.sim_freq // cfg.control_freq,
|
||||
)
|
||||
rgba = alpha_blend_rgba(
|
||||
grasp_frames[camera.name][0]["Foreground"],
|
||||
bg_images[camera.name],
|
||||
)
|
||||
video_frames.append(np.array(rgba))
|
||||
|
||||
agent.reset(agent.init_qpos)
|
||||
|
||||
os.makedirs(cfg.output_dir, exist_ok=True)
|
||||
video_path = f"{cfg.output_dir}/Iscene.mp4"
|
||||
imageio.mimsave(video_path, video_frames, fps=30)
|
||||
|
||||
@ -42,7 +42,7 @@ from embodied_gen.validators.quality_checkers import (
|
||||
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
||||
random.seed(0)
|
||||
|
||||
logger.info("Loading Models...")
|
||||
logger.info("Loading TEXT2IMG_MODEL...")
|
||||
SEMANTIC_CHECKER = SemanticConsistChecker(GPT_CLIENT)
|
||||
SEG_CHECKER = ImageSegChecker(GPT_CLIENT)
|
||||
TXTGEN_CHECKER = TextGenAlignChecker(GPT_CLIENT)
|
||||
@ -170,6 +170,7 @@ def text_to_3d(**kwargs) -> dict:
|
||||
seed=random.randint(0, 100000) if seed_3d is None else seed_3d,
|
||||
n_retry=args.n_asset_retry,
|
||||
keep_intermediate=args.keep_intermediate,
|
||||
disable_decompose_convex=args.disable_decompose_convex,
|
||||
)
|
||||
mesh_path = f"{node_save_dir}/result/mesh/{save_node}.obj"
|
||||
image_path = render_asset3d(
|
||||
@ -270,6 +271,7 @@ def parse_args():
|
||||
help="Random seed for 3D generation",
|
||||
)
|
||||
parser.add_argument("--keep_intermediate", action="store_true")
|
||||
parser.add_argument("--disable_decompose_convex", action="store_true")
|
||||
|
||||
args, unknown = parser.parse_known_args()
|
||||
|
||||
|
||||
@ -18,6 +18,7 @@ import os
|
||||
import sys
|
||||
import zipfile
|
||||
|
||||
import numpy as np
|
||||
import torch
|
||||
from huggingface_hub import hf_hub_download
|
||||
from omegaconf import OmegaConf
|
||||
@ -150,3 +151,68 @@ def monkey_patch_pano2room():
|
||||
self.inpaint_pipe = pipe
|
||||
|
||||
SDFTInpainter.__init__ = patched_sd_inpaint_init
|
||||
|
||||
|
||||
def monkey_patch_maniskill():
|
||||
from mani_skill.envs.scene import ManiSkillScene
|
||||
|
||||
def get_sensor_images(
|
||||
self, obs: dict[str, any]
|
||||
) -> dict[str, dict[str, torch.Tensor]]:
|
||||
sensor_data = dict()
|
||||
for name, sensor in self.sensors.items():
|
||||
sensor_data[name] = sensor.get_images(obs[name])
|
||||
return sensor_data
|
||||
|
||||
def get_human_render_camera_images(
|
||||
self, camera_name: str = None, return_alpha: bool = False
|
||||
) -> dict[str, torch.Tensor]:
|
||||
def get_rgba_tensor(camera, return_alpha):
|
||||
color = camera.get_obs(
|
||||
rgb=True, depth=False, segmentation=False, position=False
|
||||
)["rgb"]
|
||||
if return_alpha:
|
||||
seg_labels = camera.get_obs(
|
||||
rgb=False, depth=False, segmentation=True, position=False
|
||||
)["segmentation"]
|
||||
masks = np.where((seg_labels.cpu() > 0), 255, 0).astype(
|
||||
np.uint8
|
||||
)
|
||||
masks = torch.tensor(masks).to(color.device)
|
||||
color = torch.concat([color, masks], dim=-1)
|
||||
|
||||
return color
|
||||
|
||||
image_data = dict()
|
||||
if self.gpu_sim_enabled:
|
||||
if self.parallel_in_single_scene:
|
||||
for name, camera in self.human_render_cameras.items():
|
||||
camera.camera._render_cameras[0].take_picture()
|
||||
rgba = get_rgba_tensor(camera, return_alpha)
|
||||
image_data[name] = rgba
|
||||
else:
|
||||
for name, camera in self.human_render_cameras.items():
|
||||
if camera_name is not None and name != camera_name:
|
||||
continue
|
||||
assert camera.config.shader_config.shader_pack not in [
|
||||
"rt",
|
||||
"rt-fast",
|
||||
"rt-med",
|
||||
], "ray tracing shaders do not work with parallel rendering"
|
||||
camera.capture()
|
||||
rgba = get_rgba_tensor(camera, return_alpha)
|
||||
image_data[name] = rgba
|
||||
else:
|
||||
for name, camera in self.human_render_cameras.items():
|
||||
if camera_name is not None and name != camera_name:
|
||||
continue
|
||||
camera.capture()
|
||||
rgba = get_rgba_tensor(camera, return_alpha)
|
||||
image_data[name] = rgba
|
||||
|
||||
return image_data
|
||||
|
||||
ManiSkillScene.get_sensor_images = get_sensor_images
|
||||
ManiSkillScene.get_human_render_camera_images = (
|
||||
get_human_render_camera_images
|
||||
)
|
||||
|
||||
@ -14,7 +14,7 @@
|
||||
# implied. See the License for the specific language governing
|
||||
# permissions and limitations under the License.
|
||||
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import xml.etree.ElementTree as ET
|
||||
@ -24,19 +24,20 @@ from typing import Literal
|
||||
import mplib
|
||||
import numpy as np
|
||||
import sapien.core as sapien
|
||||
import sapien.physx as physx
|
||||
import torch
|
||||
from PIL import Image, ImageColor
|
||||
from scipy.spatial.transform import Rotation as R
|
||||
from tqdm import tqdm
|
||||
from embodied_gen.data.utils import DiffrastRender
|
||||
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||
from embodied_gen.utils.geometry import quaternion_multiply
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
from mani_skill.agents.base_agent import BaseAgent
|
||||
from mani_skill.envs.scene import ManiSkillScene
|
||||
from mani_skill.examples.motionplanning.panda.utils import (
|
||||
compute_grasp_info_by_obb,
|
||||
)
|
||||
from mani_skill.utils.geometry.trimesh_utils import get_component_mesh
|
||||
from PIL import Image, ImageColor
|
||||
from scipy.spatial.transform import Rotation as R
|
||||
from embodied_gen.data.utils import DiffrastRender
|
||||
from embodied_gen.utils.enum import LayoutInfo, Scene3DItemEnum
|
||||
from embodied_gen.utils.geometry import quaternion_multiply
|
||||
from embodied_gen.utils.log import logger
|
||||
|
||||
COLORMAP = list(set(ImageColor.colormap.values()))
|
||||
COLOR_PALETTE = np.array(
|
||||
@ -54,10 +55,267 @@ SIM_COORD_ALIGN = np.array(
|
||||
__all__ = [
|
||||
"SIM_COORD_ALIGN",
|
||||
"FrankaPandaGrasper",
|
||||
"SimpleGrasper",
|
||||
"load_assets_from_layout_file",
|
||||
"load_mani_skill_robot",
|
||||
"render_images",
|
||||
]
|
||||
|
||||
|
||||
def load_actor_from_urdf(
|
||||
scene: ManiSkillScene | sapien.Scene,
|
||||
file_path: str,
|
||||
pose: sapien.Pose,
|
||||
env_idx: int = None,
|
||||
use_static: bool = False,
|
||||
update_mass: bool = False,
|
||||
) -> sapien.pysapien.Entity:
|
||||
tree = ET.parse(file_path)
|
||||
root = tree.getroot()
|
||||
node_name = root.get("name")
|
||||
file_dir = os.path.dirname(file_path)
|
||||
|
||||
visual_mesh = root.find('.//visual/geometry/mesh')
|
||||
visual_file = visual_mesh.get("filename")
|
||||
visual_scale = visual_mesh.get("scale", "1.0 1.0 1.0")
|
||||
visual_scale = np.array([float(x) for x in visual_scale.split()])
|
||||
|
||||
collision_mesh = root.find('.//collision/geometry/mesh')
|
||||
collision_file = collision_mesh.get("filename")
|
||||
collision_scale = collision_mesh.get("scale", "1.0 1.0 1.0")
|
||||
collision_scale = np.array([float(x) for x in collision_scale.split()])
|
||||
|
||||
visual_file = os.path.join(file_dir, visual_file)
|
||||
collision_file = os.path.join(file_dir, collision_file)
|
||||
static_fric = root.find('.//collision/gazebo/mu1').text
|
||||
dynamic_fric = root.find('.//collision/gazebo/mu2').text
|
||||
|
||||
material = physx.PhysxMaterial(
|
||||
static_friction=np.clip(float(static_fric), 0.1, 0.7),
|
||||
dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
|
||||
restitution=0.05,
|
||||
)
|
||||
builder = scene.create_actor_builder()
|
||||
|
||||
body_type = "static" if use_static else "dynamic"
|
||||
builder.set_physx_body_type(body_type)
|
||||
builder.add_multiple_convex_collisions_from_file(
|
||||
collision_file if body_type == "dynamic" else visual_file,
|
||||
material=material,
|
||||
scale=collision_scale if body_type == "dynamic" else visual_scale,
|
||||
# decomposition="coacd",
|
||||
# decomposition_params=dict(
|
||||
# threshold=0.05, max_convex_hull=64, verbose=False
|
||||
# ),
|
||||
)
|
||||
|
||||
builder.add_visual_from_file(visual_file, scale=visual_scale)
|
||||
builder.set_initial_pose(pose)
|
||||
if isinstance(scene, ManiSkillScene) and env_idx is not None:
|
||||
builder.set_scene_idxs([env_idx])
|
||||
|
||||
actor = builder.build(name=f"{node_name}-{env_idx}")
|
||||
|
||||
if update_mass and hasattr(actor.components[1], "mass"):
|
||||
node_mass = float(root.find('.//inertial/mass').get("value"))
|
||||
actor.components[1].set_mass(node_mass)
|
||||
|
||||
return actor
|
||||
|
||||
|
||||
def load_assets_from_layout_file(
|
||||
scene: ManiSkillScene | sapien.Scene,
|
||||
layout: LayoutInfo | str,
|
||||
z_offset: float = 0.0,
|
||||
init_quat: list[float] = [0, 0, 0, 1],
|
||||
env_idx: int = None,
|
||||
) -> dict[str, sapien.pysapien.Entity]:
|
||||
"""Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
|
||||
|
||||
Args:
|
||||
scene (sapien.Scene | ManiSkillScene): The SAPIEN or ManiSkill scene to load assets into.
|
||||
layout (LayoutInfo): The layout information data.
|
||||
z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
|
||||
init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
|
||||
env_idx (int): Environment index for multi-environment setup.
|
||||
"""
|
||||
if isinstance(layout, str) and layout.endswith(".json"):
|
||||
layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
|
||||
|
||||
actors = dict()
|
||||
for node in layout.assets:
|
||||
file_dir = layout.assets[node]
|
||||
file_name = f"{node.replace(' ', '_')}.urdf"
|
||||
urdf_file = os.path.join(file_dir, file_name)
|
||||
|
||||
if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||
continue
|
||||
|
||||
position = layout.position[node].copy()
|
||||
if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
|
||||
position[2] += z_offset
|
||||
|
||||
use_static = (
|
||||
layout.relation.get(Scene3DItemEnum.CONTEXT.value, None) == node
|
||||
)
|
||||
|
||||
# Combine initial quaternion with object quaternion
|
||||
x, y, z, qx, qy, qz, qw = position
|
||||
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
|
||||
actor = load_actor_from_urdf(
|
||||
scene,
|
||||
urdf_file,
|
||||
sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
|
||||
env_idx,
|
||||
use_static=use_static,
|
||||
update_mass=False,
|
||||
)
|
||||
actors[node] = actor
|
||||
|
||||
return actors
|
||||
|
||||
|
||||
def load_mani_skill_robot(
|
||||
scene: sapien.Scene | ManiSkillScene,
|
||||
layout: LayoutInfo | str,
|
||||
control_freq: int = 20,
|
||||
robot_init_qpos_noise: float = 0.0,
|
||||
control_mode: str = "pd_joint_pos",
|
||||
backend_str: tuple[str, str] = ("cpu", "gpu"),
|
||||
) -> BaseAgent:
|
||||
from mani_skill.agents import REGISTERED_AGENTS
|
||||
from mani_skill.envs.scene import ManiSkillScene
|
||||
from mani_skill.envs.utils.system.backend import (
|
||||
parse_sim_and_render_backend,
|
||||
)
|
||||
|
||||
if isinstance(layout, str) and layout.endswith(".json"):
|
||||
layout = LayoutInfo.from_dict(json.load(open(layout, "r")))
|
||||
|
||||
robot_name = layout.relation[Scene3DItemEnum.ROBOT.value]
|
||||
x, y, z, qx, qy, qz, qw = layout.position[robot_name]
|
||||
delta_z = 0.002 # Add small offset to avoid collision.
|
||||
pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
|
||||
|
||||
if robot_name not in REGISTERED_AGENTS:
|
||||
logger.warning(
|
||||
f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}, use `panda` instead."
|
||||
)
|
||||
robot_name = "panda"
|
||||
|
||||
ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
|
||||
backend = parse_sim_and_render_backend(*backend_str)
|
||||
if isinstance(scene, sapien.Scene):
|
||||
scene = ManiSkillScene([scene], device=backend_str[0], backend=backend)
|
||||
robot = ROBOT_CLS(
|
||||
scene=scene,
|
||||
control_freq=control_freq,
|
||||
control_mode=control_mode,
|
||||
initial_pose=pose,
|
||||
)
|
||||
|
||||
# Set robot init joint rad agree(joint0 to joint6 w 2 finger).
|
||||
qpos = np.array(
|
||||
[
|
||||
0.0,
|
||||
np.pi / 8,
|
||||
0,
|
||||
-np.pi * 3 / 8,
|
||||
0,
|
||||
np.pi * 3 / 4,
|
||||
np.pi / 4,
|
||||
0.04,
|
||||
0.04,
|
||||
]
|
||||
)
|
||||
qpos = (
|
||||
np.random.normal(
|
||||
0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
|
||||
)
|
||||
+ qpos
|
||||
)
|
||||
qpos[:, -2:] = 0.04
|
||||
robot.reset(qpos)
|
||||
robot.init_qpos = robot.robot.qpos
|
||||
robot.controller.controllers["gripper"].reset()
|
||||
|
||||
return robot
|
||||
|
||||
|
||||
def render_images(
|
||||
camera: sapien.render.RenderCameraComponent,
|
||||
render_keys: list[
|
||||
Literal[
|
||||
"Color",
|
||||
"Segmentation",
|
||||
"Normal",
|
||||
"Mask",
|
||||
"Depth",
|
||||
"Foreground",
|
||||
]
|
||||
] = None,
|
||||
) -> dict[str, Image.Image]:
|
||||
"""Render images from a given sapien camera.
|
||||
|
||||
Args:
|
||||
camera (sapien.render.RenderCameraComponent): The camera to render from.
|
||||
render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
|
||||
|
||||
Returns:
|
||||
Dict[str, Image.Image]: Dictionary of rendered images.
|
||||
"""
|
||||
if render_keys is None:
|
||||
render_keys = [
|
||||
"Color",
|
||||
"Segmentation",
|
||||
"Normal",
|
||||
"Mask",
|
||||
"Depth",
|
||||
"Foreground",
|
||||
]
|
||||
|
||||
results: dict[str, Image.Image] = {}
|
||||
if "Color" in render_keys:
|
||||
color = camera.get_picture("Color")
|
||||
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||
results["Color"] = Image.fromarray(color_rgb)
|
||||
|
||||
if "Mask" in render_keys:
|
||||
alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
|
||||
results["Mask"] = Image.fromarray(alpha)
|
||||
|
||||
if "Segmentation" in render_keys:
|
||||
seg_labels = camera.get_picture("Segmentation")
|
||||
label0 = seg_labels[..., 0].astype(np.uint8)
|
||||
seg_color = COLOR_PALETTE[label0]
|
||||
results["Segmentation"] = Image.fromarray(seg_color)
|
||||
|
||||
if "Foreground" in render_keys:
|
||||
seg_labels = camera.get_picture("Segmentation")
|
||||
label0 = seg_labels[..., 0]
|
||||
mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
|
||||
color = camera.get_picture("Color")
|
||||
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||
foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
|
||||
results["Foreground"] = Image.fromarray(foreground)
|
||||
|
||||
if "Normal" in render_keys:
|
||||
normal = camera.get_picture("Normal")[..., :3]
|
||||
normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
|
||||
results["Normal"] = Image.fromarray(normal_img)
|
||||
|
||||
if "Depth" in render_keys:
|
||||
position_map = camera.get_picture("Position")
|
||||
depth = -position_map[..., 2]
|
||||
alpha = torch.tensor(color[..., 3], dtype=torch.float32)
|
||||
norm_depth = DiffrastRender.normalize_map_by_mask(
|
||||
torch.tensor(depth), alpha
|
||||
)
|
||||
depth_img = (norm_depth * 255).to(torch.uint8).numpy()
|
||||
results["Depth"] = Image.fromarray(depth_img)
|
||||
|
||||
return results
|
||||
|
||||
|
||||
class SapienSceneManager:
|
||||
"""A class to manage SAPIEN simulator."""
|
||||
|
||||
@ -81,11 +339,6 @@ class SapienSceneManager:
|
||||
sapien.render.set_ray_tracing_path_depth(10)
|
||||
sapien.render.set_ray_tracing_denoiser("oidn")
|
||||
|
||||
# TODO: Support GPU.
|
||||
# import sapien.physx as physx
|
||||
# physx.enable_gpu()
|
||||
# scene = sapien.Scene(systems=[sapien.physx.PhysxGpuSystem(), sapien.render.RenderSystem()])
|
||||
|
||||
scene = sapien.Scene()
|
||||
scene.set_timestep(1 / self.sim_freq)
|
||||
|
||||
@ -114,12 +367,13 @@ class SapienSceneManager:
|
||||
|
||||
def step_action(
|
||||
self,
|
||||
agent: BaseAgent,
|
||||
action: torch.Tensor,
|
||||
sim_steps_per_control: int,
|
||||
cameras: list[sapien.render.RenderCameraComponent],
|
||||
render_keys: list[str],
|
||||
sim_steps_per_control: int = 1,
|
||||
) -> dict:
|
||||
self.robot.set_action(action)
|
||||
agent.set_action(action)
|
||||
frames = defaultdict(list)
|
||||
for _ in range(sim_steps_per_control):
|
||||
self.scene.step()
|
||||
@ -127,125 +381,11 @@ class SapienSceneManager:
|
||||
self.scene.update_render()
|
||||
for camera in cameras:
|
||||
camera.take_picture()
|
||||
images = self.render_images(camera, render_keys=render_keys)
|
||||
images = render_images(camera, render_keys=render_keys)
|
||||
frames[camera.name].append(images)
|
||||
|
||||
return frames
|
||||
|
||||
def load_exists_robot(
|
||||
self,
|
||||
robot_name: str,
|
||||
pose: sapien.Pose,
|
||||
control_freq: int = 20,
|
||||
robot_init_qpos_noise: float = 0.02,
|
||||
control_mode: str = "pd_joint_pos",
|
||||
backend_str: tuple[str, str] = ("cpu", "gpu"),
|
||||
):
|
||||
from mani_skill.agents import REGISTERED_AGENTS
|
||||
from mani_skill.envs.scene import ManiSkillScene
|
||||
from mani_skill.envs.utils.system.backend import (
|
||||
parse_sim_and_render_backend,
|
||||
)
|
||||
|
||||
if robot_name == "franka":
|
||||
robot_name = "panda" # remapping
|
||||
if robot_name not in REGISTERED_AGENTS:
|
||||
raise KeyError(
|
||||
f"Robot `{robot_name}` not registered, chosen from {REGISTERED_AGENTS.keys()}."
|
||||
)
|
||||
|
||||
ROBOT_CLS = REGISTERED_AGENTS[robot_name].agent_cls
|
||||
# backend = parse_sim_and_render_backend("physx_cuda", "gpu")
|
||||
backend = parse_sim_and_render_backend(*backend_str)
|
||||
scene = ManiSkillScene(
|
||||
[self.scene], device=backend_str[0], backend=backend
|
||||
)
|
||||
robot = ROBOT_CLS(
|
||||
scene=scene,
|
||||
control_freq=control_freq,
|
||||
control_mode=control_mode,
|
||||
initial_pose=pose,
|
||||
)
|
||||
# Set robot init joint rad agree(joint0 to joint6 w 2 finger).
|
||||
qpos = np.array(
|
||||
[
|
||||
0.0,
|
||||
np.pi / 8,
|
||||
0,
|
||||
-np.pi * 3 / 8,
|
||||
0,
|
||||
np.pi * 3 / 4,
|
||||
np.pi / 4,
|
||||
0.04,
|
||||
0.04,
|
||||
]
|
||||
)
|
||||
qpos = (
|
||||
np.random.normal(
|
||||
0, robot_init_qpos_noise, (len(scene.sub_scenes), len(qpos))
|
||||
)
|
||||
+ qpos
|
||||
)
|
||||
qpos[:, -2:] = 0.04
|
||||
robot.reset(qpos)
|
||||
robot.init_qpos = robot.robot.qpos
|
||||
robot.controller.controllers["gripper"].reset()
|
||||
|
||||
return robot
|
||||
|
||||
def load_actor_from_urdf(
|
||||
self,
|
||||
file_path: str,
|
||||
pose: sapien.Pose,
|
||||
use_static: bool = False,
|
||||
update_mass: bool = False,
|
||||
) -> sapien.pysapien.Entity:
|
||||
"""Load an actor from a URDF file into the scene.
|
||||
|
||||
Args:
|
||||
file_path (str): Path to the URDF file.
|
||||
pose (sapien.Pose): Pose of the actor.
|
||||
use_static (bool): Whether to create a static actor.
|
||||
update_mass (bool): Whether to update the actor's mass from the URDF.
|
||||
|
||||
Returns:
|
||||
sapien.pysapien.Entity: The loaded actor.
|
||||
"""
|
||||
tree = ET.parse(file_path)
|
||||
root = tree.getroot()
|
||||
node_name = root.get("name")
|
||||
file_dir = os.path.dirname(file_path)
|
||||
visual_file = root.find('.//visual/geometry/mesh').get("filename")
|
||||
collision_file = root.find('.//collision/geometry/mesh').get(
|
||||
"filename"
|
||||
)
|
||||
visual_file = os.path.join(file_dir, visual_file)
|
||||
collision_file = os.path.join(file_dir, collision_file)
|
||||
static_fric = root.find('.//collision/gazebo/mu1').text
|
||||
dynamic_fric = root.find('.//collision/gazebo/mu2').text
|
||||
|
||||
material = self.scene.create_physical_material(
|
||||
static_friction=np.clip(float(static_fric), 0.1, 0.7),
|
||||
dynamic_friction=np.clip(float(dynamic_fric), 0.1, 0.6),
|
||||
restitution=0.05,
|
||||
)
|
||||
builder = self.scene.create_actor_builder()
|
||||
body_type = "static" if use_static else "dynamic"
|
||||
builder.set_physx_body_type(body_type)
|
||||
builder.add_multiple_convex_collisions_from_file(
|
||||
collision_file, material=material
|
||||
)
|
||||
builder.add_visual_from_file(visual_file)
|
||||
actor = builder.build(name=node_name)
|
||||
actor.set_name(node_name)
|
||||
actor.set_pose(pose)
|
||||
|
||||
if update_mass and hasattr(actor.components[1], "mass"):
|
||||
node_mass = float(root.find('.//inertial/mass').get("value"))
|
||||
actor.components[1].set_mass(node_mass)
|
||||
|
||||
return actor
|
||||
|
||||
def create_camera(
|
||||
self,
|
||||
cam_name: str,
|
||||
@ -344,257 +484,78 @@ class SapienSceneManager:
|
||||
|
||||
return self.cameras
|
||||
|
||||
def render_images(
|
||||
self,
|
||||
camera: sapien.render.RenderCameraComponent,
|
||||
render_keys: list[
|
||||
Literal[
|
||||
"Color",
|
||||
"Segmentation",
|
||||
"Normal",
|
||||
"Mask",
|
||||
"Depth",
|
||||
"Foreground",
|
||||
]
|
||||
] = None,
|
||||
) -> dict[str, Image.Image]:
|
||||
"""Render images from a given camera.
|
||||
|
||||
Args:
|
||||
camera (sapien.render.RenderCameraComponent): The camera to render from.
|
||||
render_keys (List[str]): Types of images to render (e.g., Color, Segmentation).
|
||||
|
||||
Returns:
|
||||
Dict[str, Image.Image]: Dictionary of rendered images.
|
||||
"""
|
||||
if render_keys is None:
|
||||
render_keys = [
|
||||
"Color",
|
||||
"Segmentation",
|
||||
"Normal",
|
||||
"Mask",
|
||||
"Depth",
|
||||
"Foreground",
|
||||
]
|
||||
|
||||
results: dict[str, Image.Image] = {}
|
||||
if "Color" in render_keys:
|
||||
color = camera.get_picture("Color")
|
||||
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||
results["Color"] = Image.fromarray(color_rgb)
|
||||
|
||||
if "Mask" in render_keys:
|
||||
alpha = (np.clip(color[..., 3], 0, 1) * 255).astype(np.uint8)
|
||||
results["Mask"] = Image.fromarray(alpha)
|
||||
|
||||
if "Segmentation" in render_keys:
|
||||
seg_labels = camera.get_picture("Segmentation")
|
||||
label0 = seg_labels[..., 0].astype(np.uint8)
|
||||
seg_color = self.COLOR_PALETTE[label0]
|
||||
results["Segmentation"] = Image.fromarray(seg_color)
|
||||
|
||||
if "Foreground" in render_keys:
|
||||
seg_labels = camera.get_picture("Segmentation")
|
||||
label0 = seg_labels[..., 0]
|
||||
mask = np.where((label0 > 1), 255, 0).astype(np.uint8)
|
||||
color = camera.get_picture("Color")
|
||||
color_rgb = (np.clip(color[..., :3], 0, 1) * 255).astype(np.uint8)
|
||||
foreground = np.concatenate([color_rgb, mask[..., None]], axis=-1)
|
||||
results["Foreground"] = Image.fromarray(foreground)
|
||||
|
||||
if "Normal" in render_keys:
|
||||
normal = camera.get_picture("Normal")[..., :3]
|
||||
normal_img = (((normal + 1) / 2) * 255).astype(np.uint8)
|
||||
results["Normal"] = Image.fromarray(normal_img)
|
||||
|
||||
if "Depth" in render_keys:
|
||||
position_map = camera.get_picture("Position")
|
||||
depth = -position_map[..., 2]
|
||||
alpha = torch.tensor(color[..., 3], dtype=torch.float32)
|
||||
norm_depth = DiffrastRender.normalize_map_by_mask(
|
||||
torch.tensor(depth), alpha
|
||||
)
|
||||
depth_img = (norm_depth * 255).to(torch.uint8).numpy()
|
||||
results["Depth"] = Image.fromarray(depth_img)
|
||||
|
||||
return results
|
||||
|
||||
def load_assets_from_layout_file(
|
||||
self,
|
||||
layout: LayoutInfo,
|
||||
z_offset: float = 0.0,
|
||||
init_quat: list[float] = [0, 0, 0, 1],
|
||||
robot_name: str = None,
|
||||
control_freq: int = 40,
|
||||
robot_init_qpos_noise: float = 0,
|
||||
) -> None:
|
||||
"""Load assets from `EmbodiedGen` layout-gen output and create actors in the scene.
|
||||
|
||||
Args:
|
||||
layout (LayoutInfo): The layout information data.
|
||||
z_offset (float): Offset to apply to the Z-coordinate of non-context objects.
|
||||
init_quat (List[float]): Initial quaternion (x, y, z, w) for orientation adjustment.
|
||||
robot_name (str): Name of the robot to load, None not load robot.
|
||||
control_freq (int): Control frequency for the robot, default is 40.
|
||||
robot_init_qpos_noise (float): Noise to add to the robot's initial joint positions, default is 0.
|
||||
"""
|
||||
for node in layout.assets:
|
||||
file_dir = layout.assets[node]
|
||||
file_name = f"{node.replace(' ', '_')}.urdf"
|
||||
urdf_file = os.path.join(file_dir, file_name)
|
||||
|
||||
if layout.objs_mapping[node] == Scene3DItemEnum.BACKGROUND.value:
|
||||
continue
|
||||
|
||||
position = layout.position[node].copy()
|
||||
if layout.objs_mapping[node] != Scene3DItemEnum.CONTEXT.value:
|
||||
position[2] += z_offset
|
||||
|
||||
use_static = (
|
||||
layout.relation.get(Scene3DItemEnum.CONTEXT.value, None)
|
||||
== node
|
||||
)
|
||||
|
||||
# Combine initial quaternion with object quaternion
|
||||
x, y, z, qx, qy, qz, qw = position
|
||||
qx, qy, qz, qw = quaternion_multiply([qx, qy, qz, qw], init_quat)
|
||||
actor = self.load_actor_from_urdf(
|
||||
urdf_file,
|
||||
sapien.Pose(p=[x, y, z], q=[qw, qx, qy, qz]),
|
||||
use_static=use_static,
|
||||
update_mass=False,
|
||||
)
|
||||
self.actors[node] = actor
|
||||
|
||||
if robot_name is not None:
|
||||
robot_node = layout.relation[Scene3DItemEnum.ROBOT.value]
|
||||
x, y, z, qx, qy, qz, qw = layout.position[robot_node]
|
||||
delta_z = 0.002 # Add small offset to avoid collision.
|
||||
pose = sapien.Pose([x, y, z + delta_z], [qw, qx, qy, qz])
|
||||
self.robot = self.load_exists_robot(
|
||||
robot_name, pose, control_freq, robot_init_qpos_noise
|
||||
)
|
||||
|
||||
return
|
||||
|
||||
|
||||
class FrankaPandaGrasper(object):
|
||||
"""Grasper planner for Franka Panda robot arm.
|
||||
|
||||
Args:
|
||||
robot: See from `mani_skill.agents.robots.panda.panda`.
|
||||
scene: The `SapienSceneManager` object.
|
||||
sim_steps_per_control: Number of simulation steps per control step.
|
||||
control_timestep: Time step for planning.
|
||||
joint_vel_limits: Velocity limit for all joints (default: 0.9)
|
||||
joint_acc_limits: Acceleration limit for all joints (default: 0.9)
|
||||
finger_length: Length of the robot's fingers (default: 0.025)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
robot,
|
||||
scene: SapienSceneManager,
|
||||
sim_steps_per_control: int,
|
||||
control_timestep: float,
|
||||
joint_vel_limits: float = 0.9,
|
||||
joint_acc_limits: float = 0.9,
|
||||
agent: BaseAgent,
|
||||
control_freq: float,
|
||||
joint_vel_limits: float = 2.0,
|
||||
joint_acc_limits: float = 1.0,
|
||||
finger_length: float = 0.025,
|
||||
) -> None:
|
||||
self.agent = robot
|
||||
self.robot = robot.robot
|
||||
self.scene = scene
|
||||
self.sim_steps_per_control = sim_steps_per_control
|
||||
self.control_timestep = control_timestep
|
||||
self.agent = agent
|
||||
self.robot = agent.robot
|
||||
self.control_freq = control_freq
|
||||
self.control_timestep = 1 / control_freq
|
||||
self.joint_vel_limits = joint_vel_limits
|
||||
self.joint_acc_limits = joint_acc_limits
|
||||
self.finger_length = finger_length
|
||||
self.planner = self._setup_planner()
|
||||
self.planners = self._setup_planner()
|
||||
|
||||
def _setup_planner(self) -> mplib.Planner:
|
||||
link_names = [link.get_name() for link in self.robot.get_links()]
|
||||
joint_names = [
|
||||
joint.get_name() for joint in self.robot.get_active_joints()
|
||||
]
|
||||
|
||||
planner = mplib.Planner(
|
||||
urdf=self.agent.urdf_path,
|
||||
srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
|
||||
user_link_names=link_names,
|
||||
user_joint_names=joint_names,
|
||||
move_group="panda_hand_tcp",
|
||||
joint_vel_limits=np.ones(7) * self.joint_vel_limits,
|
||||
joint_acc_limits=np.ones(7) * self.joint_acc_limits,
|
||||
)
|
||||
planner.set_base_pose(self.robot.pose.raw_pose[0].tolist())
|
||||
|
||||
return planner
|
||||
|
||||
def update_state(
|
||||
self,
|
||||
result: dict,
|
||||
gripper_state: Literal[-1, 1],
|
||||
sim_steps_per_control: int,
|
||||
cameras: list[sapien.render.RenderCameraComponent],
|
||||
render_keys: list[str],
|
||||
) -> dict[str, list[Image.Image]]:
|
||||
n_step = len(result["position"])
|
||||
total_frames = defaultdict(list)
|
||||
for i in tqdm(range(n_step), desc="Grasping"):
|
||||
qpos = result["position"][min(i, n_step - 1)]
|
||||
action = np.hstack([qpos, gripper_state])[None, ...]
|
||||
action = torch.from_numpy(action).float()
|
||||
frames = self.scene.step_action(
|
||||
action, sim_steps_per_control, cameras, render_keys
|
||||
planners = []
|
||||
for pose in self.robot.pose:
|
||||
link_names = [link.get_name() for link in self.robot.get_links()]
|
||||
joint_names = [
|
||||
joint.get_name() for joint in self.robot.get_active_joints()
|
||||
]
|
||||
planner = mplib.Planner(
|
||||
urdf=self.agent.urdf_path,
|
||||
srdf=self.agent.urdf_path.replace(".urdf", ".srdf"),
|
||||
user_link_names=link_names,
|
||||
user_joint_names=joint_names,
|
||||
move_group="panda_hand_tcp",
|
||||
joint_vel_limits=np.ones(7) * self.joint_vel_limits,
|
||||
joint_acc_limits=np.ones(7) * self.joint_acc_limits,
|
||||
)
|
||||
for camera in cameras:
|
||||
total_frames[camera.name].extend(frames[camera.name])
|
||||
planner.set_base_pose(pose.raw_pose[0].tolist())
|
||||
planners.append(planner)
|
||||
|
||||
return total_frames
|
||||
return planners
|
||||
|
||||
def control_gripper(
|
||||
self,
|
||||
sim_steps_per_control: int,
|
||||
cameras: list[sapien.render.RenderCameraComponent],
|
||||
render_keys: list[str],
|
||||
gripper_state: Literal[-1, 1],
|
||||
n_step: int = 10,
|
||||
) -> dict[str, list[Image.Image]]:
|
||||
total_frames = defaultdict(list)
|
||||
) -> np.ndarray:
|
||||
qpos = self.robot.get_qpos()[0, :-2].cpu().numpy()
|
||||
actions = []
|
||||
for _ in range(n_step):
|
||||
action = np.hstack([qpos, gripper_state])[None, ...]
|
||||
action = torch.from_numpy(action).float()
|
||||
frames = self.scene.step_action(
|
||||
action, sim_steps_per_control, cameras, render_keys
|
||||
)
|
||||
for camera in cameras:
|
||||
total_frames[camera.name].extend(frames[camera.name])
|
||||
actions.append(action)
|
||||
|
||||
return total_frames
|
||||
return np.concatenate(actions, axis=0)
|
||||
|
||||
def move_to_pose(
|
||||
self,
|
||||
pose: sapien.Pose,
|
||||
sim_steps_per_control: int,
|
||||
control_timestep: float,
|
||||
cameras: list[sapien.render.RenderCameraComponent],
|
||||
render_keys: list[str],
|
||||
gripper_state: Literal[-1, 1],
|
||||
use_point_cloud: bool = False,
|
||||
n_max_step: int = 100,
|
||||
) -> dict[str, list[Image.Image]]:
|
||||
# First try screw motion planning
|
||||
result = self.planner.plan_screw(
|
||||
action_key: str = "position",
|
||||
env_idx: int = 0,
|
||||
) -> np.ndarray:
|
||||
result = self.planners[env_idx].plan_qpos_to_pose(
|
||||
np.concatenate([pose.p, pose.q]),
|
||||
self.robot.get_qpos().cpu().numpy()[0],
|
||||
time_step=control_timestep,
|
||||
use_point_cloud=use_point_cloud,
|
||||
)
|
||||
|
||||
# Fallback to standard planning if screw fails
|
||||
if result["status"] != "Success":
|
||||
result = self.planner.plan_qpos_to_pose(
|
||||
result = self.planners[env_idx].plan_screw(
|
||||
np.concatenate([pose.p, pose.q]),
|
||||
self.robot.get_qpos().cpu().numpy()[0],
|
||||
time_step=control_timestep,
|
||||
@ -602,44 +563,34 @@ class FrankaPandaGrasper(object):
|
||||
)
|
||||
|
||||
if result["status"] != "Success":
|
||||
return defaultdict(list)
|
||||
return
|
||||
|
||||
for key in result:
|
||||
if key in ["status", "duration"]:
|
||||
continue
|
||||
sample_ratio = (len(result[key]) // n_max_step) + 1
|
||||
result[key] = result[key][::sample_ratio]
|
||||
sample_ratio = (len(result[action_key]) // n_max_step) + 1
|
||||
result[action_key] = result[action_key][::sample_ratio]
|
||||
|
||||
frames = self.update_state(
|
||||
result, gripper_state, sim_steps_per_control, cameras, render_keys
|
||||
)
|
||||
n_step = len(result[action_key])
|
||||
actions = []
|
||||
for i in range(n_step):
|
||||
qpos = result[action_key][i]
|
||||
action = np.hstack([qpos, gripper_state])[None, ...]
|
||||
actions.append(action)
|
||||
|
||||
return frames
|
||||
return np.concatenate(actions, axis=0)
|
||||
|
||||
def render_grasp(
|
||||
def compute_grasp_action(
|
||||
self,
|
||||
actor: sapien.pysapien.Entity,
|
||||
cameras: list[sapien.render.RenderCameraComponent],
|
||||
render_keys: list[
|
||||
Literal[
|
||||
"Color",
|
||||
"Foreground",
|
||||
"Segmentation",
|
||||
"Normal",
|
||||
"Mask",
|
||||
"Depth",
|
||||
]
|
||||
],
|
||||
reach_target_only: bool = True,
|
||||
) -> dict[str, list[Image.Image]]:
|
||||
offset: tuple[float, float, float] = [0, 0, -0.05],
|
||||
env_idx: int = 0,
|
||||
) -> np.ndarray:
|
||||
physx_rigid = actor.components[1]
|
||||
mesh = get_component_mesh(physx_rigid, to_world_frame=True)
|
||||
obb = mesh.bounding_box_oriented
|
||||
approaching = np.array([0, 0, -1])
|
||||
tcp_pose = self.agent.tcp.pose[env_idx]
|
||||
target_closing = (
|
||||
self.agent.tcp.pose.to_transformation_matrix()[0, :3, 1]
|
||||
.cpu()
|
||||
.numpy()
|
||||
tcp_pose.to_transformation_matrix()[0, :3, 1].cpu().numpy()
|
||||
)
|
||||
grasp_info = compute_grasp_info_by_obb(
|
||||
obb,
|
||||
@ -647,60 +598,46 @@ class FrankaPandaGrasper(object):
|
||||
target_closing=target_closing,
|
||||
depth=self.finger_length,
|
||||
)
|
||||
closing, center = grasp_info["closing"], grasp_info["center"]
|
||||
raw_tcp_pose = self.agent.tcp.pose.sp
|
||||
grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
|
||||
reach_pose = grasp_pose * sapien.Pose(p=[0, 0, -0.05])
|
||||
grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
|
||||
|
||||
total_frames = defaultdict(list)
|
||||
frames_list = []
|
||||
reach_frames = self.move_to_pose(
|
||||
closing, center = grasp_info["closing"], grasp_info["center"]
|
||||
raw_tcp_pose = tcp_pose.sp
|
||||
grasp_pose = self.agent.build_grasp_pose(approaching, closing, center)
|
||||
reach_pose = grasp_pose * sapien.Pose(p=offset)
|
||||
grasp_pose = grasp_pose * sapien.Pose(p=[0, 0, 0.01])
|
||||
actions = []
|
||||
reach_actions = self.move_to_pose(
|
||||
reach_pose,
|
||||
self.sim_steps_per_control,
|
||||
self.control_timestep,
|
||||
cameras,
|
||||
render_keys,
|
||||
gripper_state=1,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
frames_list.append(reach_frames)
|
||||
if len(reach_frames) == 0:
|
||||
actions.append(reach_actions)
|
||||
|
||||
if reach_actions is None:
|
||||
logger.warning(
|
||||
f"Failed to reach the grasp pose for node `{actor.name}`, skipping grasping."
|
||||
)
|
||||
return total_frames
|
||||
return None
|
||||
|
||||
if not reach_target_only:
|
||||
grasp_frames = self.move_to_pose(
|
||||
grasp_actions = self.move_to_pose(
|
||||
grasp_pose,
|
||||
self.sim_steps_per_control,
|
||||
self.control_timestep,
|
||||
cameras,
|
||||
render_keys,
|
||||
gripper_state=1,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
frames_list.append(grasp_frames)
|
||||
close_frames = self.control_gripper(
|
||||
self.sim_steps_per_control,
|
||||
cameras,
|
||||
render_keys,
|
||||
actions.append(grasp_actions)
|
||||
close_actions = self.control_gripper(
|
||||
gripper_state=-1,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
frames_list.append(close_frames)
|
||||
back_frames = self.move_to_pose(
|
||||
actions.append(close_actions)
|
||||
back_actions = self.move_to_pose(
|
||||
raw_tcp_pose,
|
||||
self.sim_steps_per_control,
|
||||
self.control_timestep,
|
||||
cameras,
|
||||
render_keys,
|
||||
gripper_state=-1,
|
||||
env_idx=env_idx,
|
||||
)
|
||||
frames_list.append(back_frames)
|
||||
actions.append(back_actions)
|
||||
|
||||
for frame_dict in frames_list:
|
||||
for cam_name, frames in frame_dict.items():
|
||||
total_frames[cam_name].extend(frames)
|
||||
|
||||
self.agent.reset(self.agent.init_qpos)
|
||||
|
||||
return total_frames
|
||||
return np.concatenate(actions, axis=0)
|
||||
|
||||
@ -1 +1 @@
|
||||
VERSION = "v0.1.3"
|
||||
VERSION = "v0.1.4"
|
||||
|
||||
@ -24,6 +24,7 @@ from xml.dom.minidom import parseString
|
||||
|
||||
import numpy as np
|
||||
import trimesh
|
||||
from embodied_gen.data.convex_decomposer import decompose_convex_mesh
|
||||
from embodied_gen.utils.gpt_clients import GPT_CLIENT, GPTclient
|
||||
from embodied_gen.utils.process_media import render_asset3d
|
||||
from embodied_gen.utils.tags import VERSION
|
||||
@ -84,6 +85,7 @@ class URDFGenerator(object):
|
||||
attrs_name: list[str] = None,
|
||||
render_dir: str = "urdf_renders",
|
||||
render_view_num: int = 4,
|
||||
decompose_convex: bool = False,
|
||||
) -> None:
|
||||
if mesh_file_list is None:
|
||||
mesh_file_list = []
|
||||
@ -132,8 +134,9 @@ class URDFGenerator(object):
|
||||
Estimate the vertical projection of their real length based on its pose.
|
||||
For example:
|
||||
- A pen standing upright in the first image (aligned with the image's vertical axis)
|
||||
full body visible in the first image: → vertical height ≈ 0.14-0.20 m
|
||||
- A pen lying flat in the first image (showing thickness or as a dot) → vertical height ≈ 0.018-0.025 m
|
||||
full body visible in the first image: → vertical height ≈ 0.14-0.20 m
|
||||
- A pen lying flat in the first image or either the tip or the tail is facing the image
|
||||
(showing thickness or as a circle) → vertical height ≈ 0.018-0.025 m
|
||||
- Tilted pen in the first image (e.g., ~45° angle): vertical height ≈ 0.07-0.12 m
|
||||
- Use the rest views to help determine the object's 3D pose and orientation.
|
||||
Assume the object is in real-world scale and estimate the approximate vertical height
|
||||
@ -156,6 +159,7 @@ class URDFGenerator(object):
|
||||
"gs_model",
|
||||
]
|
||||
self.attrs_name = attrs_name
|
||||
self.decompose_convex = decompose_convex
|
||||
|
||||
def parse_response(self, response: str) -> dict[str, any]:
|
||||
lines = response.split("\n")
|
||||
@ -258,9 +262,24 @@ class URDFGenerator(object):
|
||||
# Update collision geometry
|
||||
collision = link.find("collision/geometry/mesh")
|
||||
if collision is not None:
|
||||
collision.set(
|
||||
"filename", os.path.join(self.output_mesh_dir, obj_name)
|
||||
)
|
||||
collision_mesh = os.path.join(self.output_mesh_dir, obj_name)
|
||||
if self.decompose_convex:
|
||||
try:
|
||||
d_params = dict(
|
||||
threshold=0.05, max_convex_hull=64, verbose=False
|
||||
)
|
||||
filename = f"{os.path.splitext(obj_name)[0]}_collision.ply"
|
||||
output_path = os.path.join(mesh_folder, filename)
|
||||
decompose_convex_mesh(
|
||||
mesh_output_path, output_path, **d_params
|
||||
)
|
||||
collision_mesh = f"{self.output_mesh_dir}/{filename}"
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Convex decomposition failed for {output_path}, {e}."
|
||||
"Use original mesh for collision computation."
|
||||
)
|
||||
collision.set("filename", collision_mesh)
|
||||
collision.set("scale", "1.0 1.0 1.0")
|
||||
|
||||
# Update friction coefficients
|
||||
|
||||
@ -9,10 +9,10 @@ PIP_INSTALL_PACKAGES=(
|
||||
"xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118"
|
||||
"flash-attn==2.7.0.post2 --no-build-isolation"
|
||||
"-r requirements.txt --use-deprecated=legacy-resolver"
|
||||
"utils3d@git+https://github.com/EasternJournalist/utils3d.git#egg=9a4eb15"
|
||||
"utils3d@git+https://github.com/EasternJournalist/utils3d.git@9a4eb15"
|
||||
"clip@git+https://github.com/openai/CLIP.git"
|
||||
"segment-anything@git+https://github.com/facebookresearch/segment-anything.git#egg=dca509f"
|
||||
"nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git#egg=729261d"
|
||||
"segment-anything@git+https://github.com/facebookresearch/segment-anything.git@dca509f"
|
||||
"nvdiffrast@git+https://github.com/NVlabs/nvdiffrast.git@729261d"
|
||||
"kolors@git+https://github.com/HochCC/Kolors.git"
|
||||
"kaolin@git+https://github.com/NVIDIAGameWorks/kaolin.git@v0.16.0"
|
||||
"git+https://github.com/nerfstudio-project/gsplat.git@v1.5.3"
|
||||
|
||||
@ -7,7 +7,7 @@ packages = ["embodied_gen"]
|
||||
|
||||
[project]
|
||||
name = "embodied_gen"
|
||||
version = "v0.1.3"
|
||||
version = "v0.1.4"
|
||||
readme = "README.md"
|
||||
license = "Apache-2.0"
|
||||
license-files = ["LICENSE", "NOTICE"]
|
||||
|
||||
@ -38,5 +38,6 @@ tyro
|
||||
pyquaternion
|
||||
shapely
|
||||
sapien==3.0.0b1
|
||||
coacd
|
||||
mani_skill==3.0.0b21
|
||||
typing_extensions==4.14.1
|
||||
|
||||
@ -39,6 +39,7 @@ def test_urdf_convertor():
|
||||
"outputs/test_urdf/notebook/result/mesh/notebook.obj",
|
||||
"outputs/test_urdf/marker/result/mesh/marker.obj",
|
||||
"outputs/test_urdf/pen2/result/mesh/pen.obj",
|
||||
"outputs/test_urdf/pen3/result/mesh/pen.obj",
|
||||
]
|
||||
for idx, mesh_path in enumerate(mesh_paths):
|
||||
filename = mesh_path.split("/")[-1].split(".")[0]
|
||||
@ -49,3 +50,19 @@ def test_urdf_convertor():
|
||||
# min_height=1.0,
|
||||
# max_height=1.2,
|
||||
)
|
||||
|
||||
|
||||
def test_decompose_convex_mesh():
|
||||
urdf_gen = URDFGenerator(GPT_CLIENT, decompose_convex=True)
|
||||
mesh_paths = [
|
||||
"outputs/test_urdf/sample_0/mesh/pen.obj",
|
||||
"outputs/test_urdf/sample_1/mesh/notepad.obj",
|
||||
"outputs/test_urdf/sample_2/mesh/plate.obj",
|
||||
]
|
||||
for idx, mesh_path in enumerate(mesh_paths):
|
||||
filename = mesh_path.split("/")[-1].split(".")[0]
|
||||
urdf_path = urdf_gen(
|
||||
mesh_path=mesh_path,
|
||||
output_root=f"outputs/test_urdf3/sample_{idx}",
|
||||
category=filename,
|
||||
)
|
||||
|
||||
44
tests/test_unit/test_mesh_process.py
Normal file
44
tests/test_unit/test_mesh_process.py
Normal file
@ -0,0 +1,44 @@
|
||||
# Project EmbodiedGen
|
||||
#
|
||||
# Copyright (c) 2025 Horizon Robotics. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
# implied. See the License for the specific language governing
|
||||
# permissions and limitations under the License.
|
||||
|
||||
|
||||
import logging
|
||||
import tempfile
|
||||
from time import time
|
||||
|
||||
import pytest
|
||||
from embodied_gen.data.convex_decomposer import decompose_convex_mesh
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"input_mesh_path, max_convex_hull",
|
||||
[
|
||||
("apps/assets/example_texture/meshes/robot_text.obj", 8),
|
||||
# ("apps/assets/example_texture/meshes/robot_text.obj", 32),
|
||||
# ("apps/assets/example_texture/meshes/robot_text.obj", 64),
|
||||
# ("apps/assets/example_texture/meshes/robot_text.obj", 128),
|
||||
],
|
||||
)
|
||||
def test_decompose_convex_mesh(input_mesh_path, max_convex_hull):
|
||||
d_params = dict(
|
||||
threshold=0.05, max_convex_hull=max_convex_hull, verbose=False
|
||||
)
|
||||
with tempfile.NamedTemporaryFile(suffix='.ply', delete=True) as tmp_file:
|
||||
start_time = time()
|
||||
decompose_convex_mesh(input_mesh_path, tmp_file.name, **d_params)
|
||||
logger.info(f"Finished in {time()-start_time:.2f}s")
|
||||
Loading…
x
Reference in New Issue
Block a user