PointTriPE: Triadic Positional Encoding for Point Clouds with Multi-Scale, Local, and Relative Embeddings

PointLaSA: Learnable Spatial Anchors for Feature Aggregation in Point Clouds

    
    The code will be released
-------------------------------------------------------------------------------
Algorithm 1: MultiHeadOffsetKNN
Input:
  xyz ∈ ℝ^{B×N×3}          -- original point coordinates
  feat ∈ ℝ^{B×N×C}         -- point features
  knn_coarse ∈ ℤ^{B×N×Kc}  -- coarse KNN used for offset computation
Hyperparameters:
  k_list = [k_1,...,k_M]   -- per-head k values (M heads)
  p_hidden                 -- hidden dim for offset MLPs

Network Components:
  offset_mlp1: Linear(C → 3M)               # self-offset from feat
  offset_mlp2: MLP(3 → p_hidden)            # neighbor coords -> p_hidden
  offset_mlp3: MLP(C → p_hidden)            # center feat -> p_hidden
  offset_mlp4: Linear(2·p_hidden → 3M)      # combine -> neighbor-offset

Initialization:
  offset_mlp1, offset_mlp4 weights & biases ← 0
  offset_mlp2, offset_mlp3 linear layers ← kaiming init, biases ← 0

Forward:
  1. B,N,Kc ← shape(knn_coarse)
  2. offset_self ← offset_mlp1(feat)                      # ℝ^{B×N×3M}
     offset_self ← reshape(offset_self, (B,N,M,3))       # ℝ^{B×N×M×3}
  3. pts_knn ← index_points(xyz, knn_coarse)              # ℝ^{B×N×Kc×3}
     pts_center ← pts_knn - xyz.unsqueeze(2)              # ℝ^{B×N×Kc×3}
  4. ps_feat ← offset_mlp2(pts_center)                    # ℝ^{B×N×Kc×p_hidden}
     ps_agg  ← max_k(ps_feat)                             # ℝ^{B×N×p_hidden}
  5. f_proj ← offset_mlp3(feat)                           # ℝ^{B×N×p_hidden}
     f_knn_proj ← index_points(f_proj, knn_coarse)        # ℝ^{B×N×Kc×p_hidden}
     f_agg ← max_k(f_knn_proj)                            # ℝ^{B×N×p_hidden}
  6. el_input ← concat(ps_agg, f_agg)                     # ℝ^{B×N×2·p_hidden}
     offset_neigh ← offset_mlp4(el_input)                 # ℝ^{B×N×3M}
     offset_neigh ← reshape(offset_neigh, (B,N,M,3))     # ℝ^{B×N×M×3}
  7. new_xyz ← xyz.unsqueeze(2).expand(B,N,M,3) + offset_self + offset_neigh
     # new_xyz ∈ ℝ^{B×N×M×3}
  8. for h in 1..M:
       q_h ← new_xyz[:,:,h,:]                             # ℝ^{B×N×3}
       idx_h ← knn(q_h, xyz, k_list[h])                   # ℤ^{B×N×k_h}
     end for
  9. return idx_list = {idx_1, ..., idx_M}

-------------------------------------------------------------------------------

Algorithm 2: MultiHeadOffsetLPA
Input:
  x ∈ ℝ^{B×N×C}                -- input features
  idx_list = {idx_1,...,idx_M}  -- per-head neighbor indices, idx_i ∈ ℤ^{B×N×K_i}
Components:
  proj_i: Linear(C → D_i) for i=1..M   (bias=False)
  bn_i: BatchNorm1d(D_i) for i=1..M
Forward:
  1. features ← []
  2. for i in 1..M:
       f_i ← proj_i(x)                              # ℝ^{B×N×D_i}
       idx_i ← idx_list[i]                          # ℤ^{B×N×K_i}
       f_pool ← knn_edge_maxpooling(f_i, idx_i, training)  # ℝ^{B×N×D_i}
       f_bn ← bn_i( reshape(f_pool, (B*N, D_i)) )   # (B*N, D_i) -> reshape -> (B,N,D_i)
       append f_bn to features
     end for
  3. out ← concat(features, dim=-1)                 # ℝ^{B×N×ΣD_i}
  4. return out

-------------------------------------------------------------------------------

Algorithm 3: Mlp (Point-wise MLP with BN)
Input:
  x ∈ ℝ^{B×N×C}
Hyperparams:
  mlp_ratio, bn_momentum, act
Network:
  hid = round(C * mlp_ratio)
  mlp_net = Linear(C→hid) → act() → Linear(hid→C, bias=False) → BatchNorm1d(C)
Forward:
  1. x_flat ← reshape(x, (B*N, C))
  2. y ← mlp_net(x_flat)
  3. return reshape(y, (B, N, C))

-------------------------------------------------------------------------------

Algorithm 4: Block (Residual + two-stage MH-KNN + LPA)
Input:
  xyz ∈ ℝ^{B×N×3}
  x ∈ ℝ^{B×N×dim}
  knn_raw ∈ ℤ^{B×N×Kc}        -- coarse KNN for offset computation
Components:
  mlp, mlps (a small stack), drop_paths (length = depth),
  mhknn0, mhknn1: MultiHeadOffsetKNN instances
  mh_lpa0..3: MultiHeadOffsetLPA instances

Forward:
  1. x ← x + drop_paths[0]( mlp(x) )
  2. knn0 ← mhknn0(xyz, x, knn_raw)
  3. x ← x + drop_paths[0]( mh_lpa0(x, knn0) )
  4. x ← x + drop_paths[1]( mh_lpa1(x, knn0) )
  5. x ← x + drop_paths[1]( mlps[0](x) )
  6. knn1 ← mhknn1(xyz, x, knn_raw)
  7. x ← x + drop_paths[2]( mh_lpa2(x, knn1) )
  8. x ← x + drop_paths[3]( mh_lpa3(x, knn1) )
  9. x ← x + drop_paths[3]( mlps[1](x) )
 10. return x

Notes:
- index_points, knn, knn_edge_maxpooling, DropPath are assumed helper functions.
- Initializing offset linear layers to zero keeps early training stable.

Overview

Efficient and robust feature aggregation is essential for point cloud understanding tasks such as classification and segmentation. Most existing methods employ k-nearest neighbor (KNN) search with fixed spatial anchors (FSA), which limits their ability to adapt to complex and diverse geometric patterns. To overcome this limitation, we propose PointLaSA, a novel framework based on Learnable Spatial Anchors for multi-head, multi-scale feature aggregation. Specifically, each attention head predicts an anchor offset to generate geometry-adaptive query points, while different neighborhood sizes across heads capture complementary local contexts. The fused features from all heads yield expressive point representations that are both flexible and discriminative. Extensive experiments on standard benchmarks demonstrate that PointLaSA consistently outperforms fixed-anchor KNN baselines across classification and segmentation tasks.

Motivation

These increasingly sophisticated local feature extractors have advanced the field by capturing finer-grained geometric patterns. However, once local geometry is sufficiently characterized, their performance gains often plateau, and further complicating the extractors tends to yield only marginal improvements. This trend suggests that, rather than relentlessly pursuing more intricate local descriptors, it may be more beneficial to explore how to acquire more appropriate receptive fields, as this could exert a more significant impact on overall network performance.

Method Overview

Overall architecture of PointLaSA. We adopt a U-Net architecture to support both classification and segmentation tasks. In the encoder, a point feature transformation module first projects point features into a higher-dimensional space, followed by downsampling via farthest point sampling. Next, a spatial encoding module augments point descriptors with positional information, enriching local representations. At each Local Aggregation stage, we replace the conventional fixed-anchor KNN with the proposed Learnable Spatial Anchors, enabling geometry-adaptive feature aggregation. The decoder progressively restores spatial resolution through nearest-neighbor interpolation and feature propagation, and task-specific heads are applied for segmentation and classification. Throughout the network, we deliberately avoid complex feature extractors and instead employ lightweight MLPs and max pooling, allocating more computation to neighborhood selection for both efficiency and model compactness.

Experiments

We evaluate our model on five widely-used point cloud benchmarks spanning three representative tasks. For shape classification, we use ModelNet40 and ScanObjectNN; for part segmentation, ShapeNetPart; and for indoor scene semantic segmentation, ScanNet v2.

Dataset	Dimension	Points
ScanObjectNN	96-192-384	64-256-1024
ModelNet40	96-192-384	64-256-1024
ShapeNetPart	96-192-320-512	64-192-512-2048
ScanNet v2	64-96-160-288-512	--

Dataset

Dimension

Points

ScanObjectNN

96-192-384

64-256-1024

ModelNet40

96-192-384

64-256-1024

ShapeNetPart

96-192-320-512

64-192-512-2048

ScanNet v2

64-96-160-288-512

Method	OA (%)	mAcc (%)
PointNet	89.2	86.0
PointNet++	93.0	90.7
DGCNN	92.9	90.2
PCT	93.2	-
PTv1	93.7	90.6
CurveNet	93.8	-
PointMLP	94.1	91.3
GBNet	93.8	91.0
PointNeXt	94.0	91.1
PTv2	94.2	91.6
DualMLP	93.7	-
PointLaSA (ours)	94.2	91.8

Method

OA (%)

mAcc (%)

PointNet

89.2

86.0

PointNet++

93.0

90.7

DGCNN

92.9

90.2

PCT

93.2

PTv1

93.7

90.6

CurveNet

93.8

PointMLP

94.1

91.3

GBNet

93.8

91.0

PointNeXt

94.0

91.1

PTv2

94.2

91.6

DualMLP

93.7

PointLaSA (ours)

94.2

91.8

Method	OA (%)	mACC (%)
PointNet	75.2	71.4
PointNet++	86.2	84.4
DGCNN	86.1	84.3
PointMLP	87.7	86.4
PointNeXt	88.2	86.8
SPoTr	88.6	86.8
point2vec	87.5	86.0
CurveCloud	89.1	86.5
KPConvX	89.3	88.1
PointBERT*	83.1	-
PointMAE*	85.2	-
PPT*	89.5	-
PointLaSA (ours)	90.2	89.2

Method

OA (%)

mACC (%)

PointNet

75.2

71.4

PointNet++

86.2

84.4

DGCNN

86.1

84.3

PointMLP

87.7

86.4

PointNeXt

88.2

86.8

SPoTr

88.6

86.8

point2vec

87.5

86.0

CurveCloud

89.1

86.5

KPConvX

89.3

88.1

PointBERT*

83.1

PointMAE*

85.2

PPT*

89.5

PointLaSA (ours)

90.2

89.2

Method	Ins. mIoU	Cls. mIoU
PointNet	83.7	80.4
PointNet++	85.1	81.9
DGCNN	85.2	82.3
PTv1	86.6	83.7
AGCN	86.9	85.1
RSCNN	86.2	84.0
CurveNet	86.8	84.2
OTMae3D	86.8	85.1
PointJEPA	83.9	85.8
AdaCrossNet	-	85.1
PointTriPE (ours)	87.4	85.1

Method

Ins. mIoU

Cls. mIoU

PointNet

83.7

80.4

PointNet++

85.1

81.9

DGCNN

85.2

82.3

PTv1

86.6

83.7

AGCN

86.9

85.1

RSCNN

86.2

84.0

CurveNet

86.8

84.2

OTMae3D

86.8

85.1

PointJEPA

83.9

85.8

AdaCrossNet

85.1

PointTriPE (ours)

87.4

85.1

Method	Val mIoU	Test mIoU
O-CNN	74.0	72.8
PointNet++	53.5	33.9
PointConv	61.0	55.6
KpConv	69.2	68.0
Minkowski	72.2	73.4
SFormer	74.3	73.7
BPNet	73.9	74.9
PTv2	75.4	75.2
KPConvX-L	76.2	75.6
PointTriPE (ours)	76.0	75.4

Method

Val mIoU

Test mIoU

O-CNN

74.0

72.8

PointNet++

53.5

33.9

PointConv

61.0

55.6

KpConv

69.2

68.0

Minkowski

72.2

73.4

SFormer

74.3

73.7

BPNet

73.9

74.9

PTv2

75.4

75.2

KPConvX-L

76.2

75.6

PointTriPE (ours)

76.0

75.4

No.	Offset	Multi-Head	Multi-Scale	Multi-Dim	ModelNet40 (OA)	ShapeNet (Ins. mIoU)	ScanNet v2 (mIoU)
1	✓	✓	✓	✓	94.1%	87.4%	75.4%
2	✓	✓	✓	✗	94.2%	87.2%	75.1%
3	✓	✓	✗	✓	94.2%	87.1%	75.0%
4	✓	✓	✗	✗	94.0%	87.0%	74.8%
5	✓	✗	✗	✗	93.9%	87.0%	74.6%
6	✗	✓	✓	✓	94.0%	87.1%	74.6%
7	✗	✓	✓	✗	93.8%	87.2%	74.4%
8	✗	✓	✗	✓	93.7%	87.0%	74.4%
9	✗	✓	✗	✗	93.6%	86.9%	74.6%
10	✗	✗	✗	✗	93.4%	86.8%	74.2%

No.

Offset

Multi-Head

Multi-Scale

Multi-Dim

ModelNet40 (OA)

ShapeNet (Ins. mIoU)

ScanNet v2 (mIoU)

✓

94.1%

87.4%

75.4%

✓

✗

94.2%

87.2%

75.1%

✓

✗

✓

94.2%

87.1%

75.0%

✓

✗

94.0%

87.0%

74.8%

✓

✗

93.9%

87.0%

74.6%

✗

✓

94.0%

87.1%

74.6%

✗

✓

✗

93.8%

87.2%

74.4%

✗

✓

✗

✓

93.7%

87.0%

74.4%

✗

✓

✗

93.6%

86.9%

74.6%

✗

93.4%

86.8%

74.2%