# EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence [![๐ŸŒ Project Page](https://img.shields.io/badge/๐ŸŒ-Project_Page-blue)](https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html) [![๐Ÿ“„ arXiv](https://img.shields.io/badge/๐Ÿ“„-arXiv-b31b1b)](https://arxiv.org/abs/2506.10600) [![๐ŸŽฅ Video](https://img.shields.io/badge/๐ŸŽฅ-Video-red)](https://www.youtube.com/watch?v=rG4odybuJRk) [![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Image_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) [![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Text_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) [![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Texture_Gen_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) **EmbodiedGen** is a toolkit to generate diverse and interactive 3D worlds composed of generative 3D assets with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. EmbodiedGen composed of six key modules: `Image-to-3D`, `Text-to-3D`, `Texture Generation`, `Articulated Object Generation`, `Scene Generation` and `Layout Generation`. Overall Framework --- ## โœจ Table of Contents of EmbodiedGen - [๐Ÿ–ผ๏ธ Image-to-3D](#image-to-3d) - [๐Ÿ“ Text-to-3D](#text-to-3d) - [๐ŸŽจ Texture Generation](#texture-generation) - [๐ŸŒ 3D Scene Generation](#3d-scene-generation) - [โš™๏ธ Articulated Object Generation](#articulated-object-generation) - [๐Ÿž๏ธ Layout Generation](#layout-generation) ## ๐Ÿš€ Quick Start ### โœ… Setup Environment ```sh git clone https://github.com/HorizonRobotics/EmbodiedGen.git cd EmbodiedGen git checkout v0.1.0 git submodule update --init --recursive --progress conda create -n embodiedgen python=3.10.13 -y conda activate embodiedgen bash install.sh ``` ### ๐ŸŸข Setup GPT Agent Update the API key in file: `embodied_gen/utils/gpt_config.yaml`. You can choose between two backends for the GPT agent: - **`gpt-4o`** (Recommended) โ€“ Use this if you have access to **Azure OpenAI**. - **`qwen2.5-vl`** โ€“ An alternative with free usage via OpenRouter, apply a free key [here](https://openrouter.ai/settings/keys) and update `api_key` in `embodied_gen/utils/gpt_config.yaml` (50 free requests per day) ---

๐Ÿ–ผ๏ธ Image-to-3D

[![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Image_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D) Generate physically plausible 3D asset from input image. Image to 3D ### Service Run the image-to-3D generation service locally. The first run will download required models. ```sh # Run in foreground python apps/image_to_3d.py # Or run in the background CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 & ``` ### API Generate a 3D model from an image using the command-line API. Models will be downloaded automatically, please wait for the first run. ```sh python3 embodied_gen/scripts/imageto3d.py \ --image_path apps/assets/example_image/sample_04.jpg apps/assets/example_image/sample_19.jpg \ --output_root outputs/imageto3d/ # See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result ``` ---

๐Ÿ“ Text-to-3D

[![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Text_to_3D_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D) Create 3D assets from text descriptions for a wide range of geometry and styles. Text to 3D ### Service Run the text-to-3D generation service locally. ```sh python apps/text_to_3d.py ``` ### API Models will be downloaded automatically, see `download_kolors_weights`. ```sh bash embodied_gen/scripts/textto3d.sh \ --prompts "small bronze figurine of a lion" "A globe with wooden base and latitude and longitude lines" "ๆฉ™่‰ฒ็”ตๅŠจๆ‰‹้’ป๏ผŒๆœ‰็ฃจๆŸ็ป†่Š‚" \ --output_root outputs/textto3d/ ``` ---

๐ŸŽจ Texture Generation

[![๐Ÿค— Hugging Face](https://img.shields.io/badge/๐Ÿค—-Texture_Gen_Demo-blue)](https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen) Generate visually rich textures for 3D mesh. Texture Gen ### Service Run the texture generation service locally. ```sh python apps/texture_edit.py ``` ### API Models will be downloaded automatically, see `download_kolors_weights`, `geo_cond_mv`. ```sh bash embodied_gen/scripts/texture_gen.sh \ --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \ --prompt "ไธพ็€็‰Œๅญ็š„ๅ†™ๅฎž้ฃŽๆ ผๆœบๅ™จไบบ๏ผŒๅคง็œผ็›๏ผŒ็‰ŒๅญไธŠๅ†™็€โ€œHelloโ€็š„ๆ–‡ๅญ—" \ --output_root "outputs/texture_gen/" \ --uuid "robot_text" ``` ---

๐ŸŒ 3D Scene Generation

๐Ÿšง *Coming Soon* scene3d ---

โš™๏ธ Articulated Object Generation

๐Ÿšง *Coming Soon* articulate ---

๐Ÿž๏ธ Layout Generation

๐Ÿšง *Coming Soon* --- ## ๐Ÿ“š Citation If you use EmbodiedGen in your research or projects, please cite: ```bibtex @misc{wang2025embodiedgengenerative3dworld, title={EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence}, author={Xinjie Wang and Liu Liu and Yu Cao and Ruiqi Wu and Wenkang Qin and Dehui Wang and Wei Sui and Zhizhong Su}, year={2025}, eprint={2506.10600}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2506.10600}, } ``` --- ## ๐Ÿ™Œ Acknowledgement EmbodiedGen builds upon the following amazing projects and models: ๐ŸŒŸ [Trellis](https://github.com/microsoft/TRELLIS) | ๐ŸŒŸ [Hunyuan-Delight](https://huggingface.co/tencent/Hunyuan3D-2/tree/main/hunyuan3d-delight-v2-0) | ๐ŸŒŸ [Segment Anything](https://github.com/facebookresearch/segment-anything) | ๐ŸŒŸ [Rembg](https://github.com/danielgatis/rembg) | ๐ŸŒŸ [RMBG-1.4](https://huggingface.co/briaai/RMBG-1.4) | ๐ŸŒŸ [Stable Diffusion x4](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | ๐ŸŒŸ [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) | ๐ŸŒŸ [Kolors](https://github.com/Kwai-Kolors/Kolors) | ๐ŸŒŸ [ChatGLM3](https://github.com/THUDM/ChatGLM3) | ๐ŸŒŸ [Aesthetic Score](http://captions.christoph-schuhmann.de/aesthetic_viz_laion_sac+logos+ava1-l14-linearMSE-en-2.37B.html) | ๐ŸŒŸ [Pano2Room](https://github.com/TrickyGo/Pano2Room) | ๐ŸŒŸ [Diffusion360](https://github.com/ArcherFMY/SD-T2I-360PanoImage) | ๐ŸŒŸ [Kaolin](https://github.com/NVIDIAGameWorks/kaolin) | ๐ŸŒŸ [diffusers](https://github.com/huggingface/diffusers) | ๐ŸŒŸ [gsplat](https://github.com/nerfstudio-project/gsplat) | ๐ŸŒŸ [QWEN2.5VL](https://github.com/QwenLM/Qwen2.5-VL) | ๐ŸŒŸ [GPT4o](https://platform.openai.com/docs/models/gpt-4o) --- ## โš–๏ธ License This project is licensed under the [Apache License 2.0](LICENSE). See the `LICENSE` file for details.