TPA3D: Triplane Attention for Fast Text-to-3D Generation

Bin-Shih Wu^*, Hong-En Chen^*, Sheng-Yu Huang, Yu-Chiang Frank Wang

National Taiwan University, NVIDIA
ECCV 2024
^*Indicates Equal Contribution

Paper Code arXiv

Examples of chair manipulation by adding different detailed text descriptions.

Example text-guided 3D generation results of TPA3D.
(a) muscle car (b) pickup truck (c) sofa (d) office chair (e) scooter (f) dirt bike

Qualitative comparisons with SDS-based methods.

Method

Overview of TPA3D for fast text-guided 3D generation. By taking sentence and word-level features as the inputs, TPA3D utilizes generator G and triplane attention (TPA) modules to predict the associated triplane features for 3D textured mesh generation, with 3D content information properly observed. Following GET3D, each G contains branches for geometry and texture synthesis. Note that InstructBLIP is applied to produce pseudo captions from rendered images during training, while CLIP extracts the resulting text features.

Design of TriPlane Attention (TPA). TPA first performs plane-wise self-attention and cross-plane attention to 3D triplane features to enforce intra-plane consistency and 3D spatial connectivity, respectively. Cross-word attention is subsequently performed to exploit word-level features for incorporating detailed information.

Quantitative Results

Quantitative results in terms of (a) FID and (b) CLIP R-Precision@5. Compared to TAPS3D with only sentence-level features, our TPA3D performs additional word-level refinement and results in better visual quality and improved alignment between generated shapes and given text prompts.

More Qualitative Results

More manipulation examples of adding different detailed text descriptions.

The interpolation results of our TPA3D.

Ablation Study

Ablation study on components in TPA blocks. To verify the function of each component in TPA blocks as our claim, we singly remove cross-plane attention or cross-word attention in TPA blocks.

The architectures for the ablation study on components in TPA blocks. (a) Without crossplane attention, the word features might be attended to the region with incomplete spatial information, which leads to a lower visual quality. (b) Without cross-word attention, triplanes lack detailed information in the description and only contain global information from sentence features.

TPA3D: Triplane Attention for Fast Text-to-3D Generation

Abstract

Examples of chair manipulation by adding different detailed text descriptions.

Example text-guided 3D generation results of TPA3D.
(a) muscle car (b) pickup truck (c) sofa (d) office chair (e) scooter (f) dirt bike

Qualitative comparisons with SDS-based methods.

Method

Quantitative Results

More Qualitative Results

More manipulation examples of adding different detailed text descriptions.

The interpolation results of our TPA3D.

Ablation Study

Ablation study on components in TPA blocks. To verify the function of each component in TPA blocks as our claim, we singly remove cross-plane attention or cross-word attention in TPA blocks.

Ablation study for assessing the efficacy of TPA blocks in geometry and texture branches.

Ablation study on block numbers of TPA.

Ablation study on training objectives of TPA (CLIP similarity score & mismatching objective).

BibTeX

TPA3D: Triplane Attention for Fast Text-to-3D Generation

Abstract

Examples of chair manipulation by adding different detailed text descriptions.

Example text-guided 3D generation results of TPA3D. (a) muscle car (b) pickup truck (c) sofa (d) office chair (e) scooter (f) dirt bike

Qualitative comparisons with SDS-based methods.

Method

Quantitative Results

More Qualitative Results

More manipulation examples of adding different detailed text descriptions.

The interpolation results of our TPA3D.

Ablation Study

Ablation study on components in TPA blocks. To verify the function of each component in TPA blocks as our claim, we singly remove cross-plane attention or cross-word attention in TPA blocks.

Ablation study for assessing the efficacy of TPA blocks in geometry and texture branches.

Ablation study on block numbers of TPA.

Ablation study on training objectives of TPA (CLIP similarity score & mismatching objective).

BibTeX

Example text-guided 3D generation results of TPA3D.
(a) muscle car (b) pickup truck (c) sofa (d) office chair (e) scooter (f) dirt bike