Atlas Gaussians Diffusion for 3D Generation

¹The University of Texas at Austin, ²Alibaba Group
ICLR 2025 (Spotlight)
^*Indicates Equal Contribution

Abstract

Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables the generation of high-quality details. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation.

Further Information

Ours:

Atlas Gaussians Diffusion for 3D Generation (https://arxiv.org/abs/2408.13055)

Several concurrent works by others:

L3DG: Latent 3D Gaussian Diffusion (https://arxiv.org/abs/2410.13530)

DiffGS: Functional Gaussian Splatting Diffusion (https://arxiv.org/abs/2410.19657)

Structured 3D Latents for Scalable and Versatile 3D Generation (https://arxiv.org/abs/2412.01506)

GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation (https://arxiv.org/abs/2411.08033)

BibTeX

@inproceedings{ yang2025atlas, title={Atlas Gaussians Diffusion for 3D Generation}, author={Haitao Yang and Yuan Dong and Hanwen Jiang and Dejia Xu and Georgios Pavlakos and Qixing Huang}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=H2Gxil855b} }

Atlas Gaussians Diffusion for 3D Generation

TL;DR:
(1) A new representation that can generate theoretically infinite number of 3DGS.
(2) We are among the first ^‡ to apply the VAE + LDM (Latent Diffusion Model) paradigm to 3DGS generation.

Abstract

Atlas Gaussians representation

(Left) Atlas Gaussians \( \mathcal{A} \) model the shape as a union of patches, where each patch can decode 3D Gaussians. (Right) Each patch \( a_i \) is parameterized by patch center \( x_i \) and patch features \( f_i \) and \( h_i \). The 3D Gaussians are decoded via the UV-based sampling.

VAE architecture

The proposed VAE architecture. CA denotes the cross-attention layer. For simplicity, the variational component of the VAE is omitted. The latent \( z_0 \) is used for latent diffusion.

Analysis (diversity & controllability & originality)

(Left) Our generated results demonstrate significant diversity. (Right) Our generated results align closely with the text prompts, allowing for strong controllability. In the second row of each group, we present the nearest neighbors (NN) from the training dataset.

Text-conditioned 3D generation

Comparison of text-conditioned 3D generation between baseline approaches on Objaverse. From left to right: GVGEN (He et al., 2024), LN3Diff (Lan et al., 2024), LGM (Tang et al., 2024), Shap-E (Jun & Nichol, 2023), and our method.

Further Information

BibTeX

Atlas Gaussians Diffusion for 3D Generation

TL;DR: (1) A new representation that can generate theoretically infinite number of 3DGS. (2) We are among the first ‡ to apply the VAE + LDM (Latent Diffusion Model) paradigm to 3DGS generation.

Abstract

Atlas Gaussians representation

(Left) Atlas Gaussians \( \mathcal{A} \) model the shape as a union of patches, where each patch can decode 3D Gaussians. (Right) Each patch \( a_i \) is parameterized by patch center \( x_i \) and patch features \( f_i \) and \( h_i \). The 3D Gaussians are decoded via the UV-based sampling.

VAE architecture

The proposed VAE architecture. CA denotes the cross-attention layer. For simplicity, the variational component of the VAE is omitted. The latent \( z_0 \) is used for latent diffusion.

Analysis (diversity & controllability & originality)

(Left) Our generated results demonstrate significant diversity. (Right) Our generated results align closely with the text prompts, allowing for strong controllability. In the second row of each group, we present the nearest neighbors (NN) from the training dataset.

Text-conditioned 3D generation

Comparison of text-conditioned 3D generation between baseline approaches on Objaverse. From left to right: GVGEN (He et al., 2024), LN3Diff (Lan et al., 2024), LGM (Tang et al., 2024), Shap-E (Jun & Nichol, 2023), and our method.

Further Information

BibTeX

TL;DR:
(1) A new representation that can generate theoretically infinite number of 3DGS.
(2) We are among the first ^‡ to apply the VAE + LDM (Latent Diffusion Model) paradigm to 3DGS generation.