embodiedgen/README.md
Xinjie 5f810f3574
chore(model): Put image_encoder to cuda to adapt to hf zero-gpu. (#7)
chore(model): Put image_encoder to cuda to adapt to hf zero-gpu.
2025-06-13 01:09:49 +08:00

6.1 KiB
Raw Blame History

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

🌐 Project Page 📄 arXiv 🎥 Video 🤗 Hugging Face 🤗 Hugging Face 🤗 Hugging Face

EmbodiedGen is a toolkit to generate diverse and interactive 3D worlds composed of generative 3D assets with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. EmbodiedGen composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation.

Overall Framework

Table of Contents of EmbodiedGen

🚀 Quick Start

Setup Environment

git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y
conda activate embodiedgen
bash install.sh

🟢 Setup GPT Agent

Update the API key in file: embodied_gen/utils/gpt_config.yaml.

You can choose between two backends for the GPT agent:

  • gpt-4o (Recommended) Use this if you have access to Azure OpenAI.
  • qwen2.5-vl An alternative with free usage via OpenRouter, apply a free key here and update api_key in embodied_gen/utils/gpt_config.yaml (50 free requests per day)

🖼️ Image-to-3D

🤗 Hugging Face Generate physically plausible 3D asset from input image.

Service

Run the image-to-3D generation service locally. The first run will download required models.

# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &

API

Generate a 3D model from an image using the command-line API.

python3 embodied_gen/scripts/imageto3d.py \
    --image_path apps/assets/example_image/sample_04.jpg apps/assets/example_image/sample_19.jpg \
    --output_root outputs/imageto3d/

# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result

📝 Text-to-3D

🤗 Hugging Face Create 3D assets from text descriptions for a wide range of geometry and styles.

Service

Run the text-to-3D generation service locally.

python apps/text_to_3d.py

API

bash embodied_gen/scripts/textto3d.sh \
    --prompts "small bronze figurine of a lion" "带木质底座,具有经纬线的地球仪" "橙色电动手钻,有磨损细节" \
    --output_root outputs/textto3d/

🎨 Texture Generation

🤗 Hugging Face Generate visually rich textures for 3D mesh.

Service

Run the texture generation service locally.

python apps/texture_edit.py

API

Generate textures for a 3D mesh using a text prompt.

bash embodied_gen/scripts/texture_gen.sh \
    --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
    --prompt "举着牌子的红色写实风格机器人牌子上写着“Hello”" \
    --output_root "outputs/texture_gen/" \
    --uuid "robot_text"

🌍 3D Scene Generation

🚧 Coming Soon


⚙️ Articulated Object Generation

🚧 Coming Soon


🏞️ Layout Generation

🚧 Coming Soon


📚 Citation

If you use EmbodiedGen in your research or projects, please cite:

Coming Soon

🙌 Acknowledgement

EmbodiedGen builds upon the following amazing projects and models: 🌟 Trellis | 🌟 Hunyuan-Delight | 🌟 Segment Anything | 🌟 Rembg | 🌟 RMBG-1.4 | 🌟 Stable Diffusion x4 | 🌟 Real-ESRGAN | 🌟 Kolors | 🌟 ChatGLM3 | 🌟 Aesthetic Score | 🌟 Pano2Room | 🌟 Diffusion360 | 🌟 Kaolin | 🌟 diffusers | 🌟 gsplat | 🌟 GPT: QWEN2.5VL, GPT4o


⚖️ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.