Name That Part: 3D Part Segmentation and Naming

Setup & notation. We represent a 3D shape as a point set $\mathcal{P}=\{\mathbf{x}_i\}_{i=1}^N$ (sampled from a mesh/point cloud). The model predicts $K$ Partlets, each with mask logits $\mathbf{m}_k\in\mathbb{R}^{N}$ and a text embedding $\hat{\mathbf{z}}_k\in\mathbb{R}^{d_t}$. Ground-truth provides $A$ part masks $\mathbf{m}^{\mathrm{gt}}_a\in\{0,1\}^{N}$ with text embeddings $\hat{\mathbf{t}}_a\in\mathbb{R}^{d_t}$. A differentiable set matching (Sinkhorn) yields an assignment $\pi(k)\in\{1,\ldots,A\}\cup\{\emptyset\}$; let $\mathcal{M}=\{k:\pi(k)\neq\emptyset\}$ denote matched Partlets.

Text alignment (InfoNCE). Makes Partlet embeddings nameable by pulling matched (Partlet, text) pairs together and pushing others apart.

$$ ℒ_{\text{text}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} -\log\frac{\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_{\pi(k)}/\tau)} {\sum_{a=1}^{A}\exp(\hat{\mathbf{z}}_k\cdot\hat{\mathbf{t}}_a/\tau)} $$

Mask supervision (BCE + Dice). Encourages accurate part boundaries and robust overlap with ground-truth parts.

$$ ℒ_{\text{mask}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \Big[\mathrm{BCE}(\mathbf{m}_k,\mathbf{m}^{\mathrm{gt}}_{\pi(k)}) +\big(1-\mathrm{Dice}(\sigma(\mathbf{m}_k),\mathbf{m}^{\mathrm{gt}}_{\pi(k)})\big)\Big] $$

Partness loss. Learns when a Partlet should be “active” vs. “no-part”, enabling variable part counts.

$$ ℒ_{\text{part}}=\frac{1}{K}\sum_{k=1}^{K}\mathrm{BCE}(\text{part}_k,\mathbf{1}[\pi(k)\neq\emptyset]) $$

Regularizers. Reduce over/under-segmentation and prevent multiple Partlets from claiming the same points.

$$ ℒ_{\text{cov}}=\frac{1}{|\mathcal{M}|}\sum_{k\in\mathcal{M}} \left|\frac{\sum_i \sigma(m_{ki})-\sum_i m^{\mathrm{gt}}_{\pi(k)i}}{N}\right| \qquad ℒ_{\text{overlap}}=\frac{1}{N}\sum_{i=1}^{N}\Big(\sum_{k=1}^{K}\sigma(m_{ki})-1\Big)^2 $$

Total objective. A weighted sum of the above terms (plus an auxiliary global alignment loss):

$$ ℒ_{\text{total}}= \lambda_{\text{mask}}ℒ_{\text{mask}}+ \lambda_{\text{part}}ℒ_{\text{part}}+ \lambda_{\text{text}}ℒ_{\text{text}}+ \lambda_{\text{cov}}ℒ_{\text{cov}}+ \lambda_{\text{ov}}ℒ_{\text{overlap}} $$

Variant	mIoU↑	LA-mIoU↑	rLA-mIoU↑
Base model	0.312	0.030	0.194
No ℒ_cov, ℒ_ov, ℒ_txt	0.324	0.021	0.187
No ℒ_cov, ℒ_ov	0.528	0.239	0.443
Geo input only	0.313	0.027	0.177
Feature concat	0.302	0.036	0.189
PartField+MPNet	0.451	0.199	0.368
ALIGN-Parts	0.600	0.316	0.529

Name That Part:

3D Part Segmentation and Naming

Motivation

Introduction

Experiments

Conclusion

Future Work

BibTeX