# *EmbodiedGen*: Towards a Generative 3D World Engine for Embodied Intelligence
[](https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html)
[](https://arxiv.org/abs/2506.10600)
[](https://www.youtube.com/watch?v=rG4odybuJRk)
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D)
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D)
[](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen)
[](https://mp.weixin.qq.com/s/HH1cPBhK2xcDbyCK4BBTbw)
> ***EmbodiedGen*** is a generative engine to create diverse and interactive 3D worlds composed of high-quality 3D assets(mesh & 3DGS) with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research.
> It composed of six key modules: `Image-to-3D`, `Text-to-3D`, `Texture Generation`, `Articulated Object Generation`, `Scene Generation` and `Layout Generation`.
---
## โจ Table of Contents of EmbodiedGen
- [๐ผ๏ธ Image-to-3D](#image-to-3d)
- [๐ Text-to-3D](#text-to-3d)
- [๐จ Texture Generation](#texture-generation)
- [๐ 3D Scene Generation](#3d-scene-generation)
- [โ๏ธ Articulated Object Generation](#articulated-object-generation)
- [๐๏ธ Layout (Interactive 3D Worlds) Generation](#layout-generation)
## ๐ Quick Start
### โ
Setup Environment
```sh
git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git checkout v0.1.1
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y
conda activate embodiedgen
bash install.sh
```
### โ
Setup GPT Agent
Update the API key in file: `embodied_gen/utils/gpt_config.yaml`.
You can choose between two backends for the GPT agent:
- **`gpt-4o`** (Recommended) โ Use this if you have access to **Azure OpenAI**.
- **`qwen2.5-vl`** โ An alternative with free usage via OpenRouter, apply a free key [here](https://openrouter.ai/settings/keys) and update `api_key` in `embodied_gen/utils/gpt_config.yaml` (50 free requests per day)
---
### โ๏ธ Service
Run the image-to-3D generation service locally.
Models downloaded automatically on first run, please be patient.
```sh
# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &
```
### โก API
Generate physically plausible 3D assets from image input via the command-line API.
```sh
img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/example_image/sample_19.jpg \
--n_retry 2 --output_root outputs/imageto3d
# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result
```
---
### โ๏ธ Service
Deploy the text-to-3D generation service locally.
Text-to-image model based on the Kolors model, supporting Chinese and English prompts.
Models downloaded automatically on first run, please be patient.
```sh
python apps/text_to_3d.py
```
### โก API
Text-to-image model based on SD3.5 Medium, English prompts only.
Usage requires agreement to the [model license(click accept)](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium), models downloaded automatically. (ps: models with more permissive licenses found in `embodied_gen/models/image_comm_model.py`)
For large-scale 3D assets generation, set `--n_pipe_retry=2` to ensure high end-to-end 3D asset usability through automatic quality check and retries. For more diverse results, do not set `--seed_img`.
```sh
text3d-cli --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
--n_image_retry 2 --n_asset_retry 2 --n_pipe_retry 1 --seed_img 0 \
--output_root outputs/textto3d
```
Text-to-image model based on the Kolors model.
```sh
bash embodied_gen/scripts/textto3d.sh \
--prompts "small bronze figurine of a lion" "A globe with wooden base and latitude and longitude lines" "ๆฉ่ฒ็ตๅจๆ้ป๏ผๆ็ฃจๆ็ป่" \
--output_root outputs/textto3d_k
```
---
### โ๏ธ Service
Run the texture generation service locally.
Models downloaded automatically on first run, see `download_kolors_weights`, `geo_cond_mv`.
```sh
python apps/texture_edit.py
```
### โก API
Support Chinese and English prompts.
```sh
bash embodied_gen/scripts/texture_gen.sh \
--mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
--prompt "ไธพ็็ๅญ็ๅๅฎ้ฃๆ ผๆบๅจไบบ๏ผๅคง็ผ็๏ผ็ๅญไธๅ็โHelloโ็ๆๅญ" \
--output_root "outputs/texture_gen/robot_text"
bash embodied_gen/scripts/texture_gen.sh \
--mesh_path "apps/assets/example_texture/meshes/horse.obj" \
--prompt "A gray horse head with flying mane and brown eyes" \
--output_root "outputs/texture_gen/gray_horse"
```
---
---
---
![]() |
![]() |
![]() |
![]() |
---
## For Developer
```sh
pip install .[dev] && pre-commit install
python -m pytest # Pass all unit-test are required.
```
## ๐ Citation
If you use EmbodiedGen in your research or projects, please cite:
```bibtex
@misc{wang2025embodiedgengenerative3dworld,
title={EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence},
author={Xinjie Wang and Liu Liu and Yu Cao and Ruiqi Wu and Wenkang Qin and Dehui Wang and Wei Sui and Zhizhong Su},
year={2025},
eprint={2506.10600},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.10600},
}
```
---
## ๐ Acknowledgement
EmbodiedGen builds upon the following amazing projects and models:
๐ [Trellis](https://github.com/microsoft/TRELLIS) | ๐ [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | ๐ [Segment Anything](https://github.com/facebookresearch/segment-anything) | ๐ [Rembg](https://github.com/danielgatis/rembg) | ๐ [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | ๐ [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | ๐ [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | ๐ [Kolors](https://github.com/Kwai-Kolors/Kolors) | ๐ [ChatGLM3](https://github.com/THUDM/ChatGLM3) | ๐ [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | ๐ [Pano2Room](https://github.com/TrickyGo/Pano2Room) | ๐ [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | ๐ [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | ๐ [diffusers](https://github.com/huggingface/diffusers) | ๐ [gsplat](https://github.com/nerfstudio-project/gsplat) | ๐ [QWEN-2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | ๐ [GPT4o](https://platform.openai.com/docs/models/gpt-4o) | ๐ [SD3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
---
## โ๏ธ License
This project is licensed under the [Apache License 2.0](LICENSE). See the `LICENSE` file for details.