URL
TL;DR
BEVDet4D 是基于 BEVDet 加入了时序信息的一篇论文
- 具体来说就是将上一帧的
BEV Feature 和本帧的 BEV Feature 对齐后 Concat 到一起送入 BEV Encoder 中进行 BEV 视角下的 3D 目标检测
BEVDet 论文中的 image encoder + view transformer 完全保持不变
- 由于有两帧的信息,所以对速度的预测相较于单帧有较大提升
Algorithm
整体流程

在 BEV Feature 层面(View Transformer 的输出)融合两帧信息
算法的伪代码表示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| class BEVDet4D: def __init__(self): self.image_view_encoder = ImageViewEncoder() self.view_transformer = ViewTransformer() self.bev_encoder = BEVEncoder() self.head = DetectionHead() self.previous_bev_feature = None def spatial_alignment(self, feature): aligned_feature = ... return aligned_feature def forward(self, current_image): image_feature = self.image_view_encoder(current_image) transformed_feature = self.view_transformer(image_feature) current_bev_feature = self.bev_encoder(transformed_feature) if self.previous_bev_feature: aligned_previous_feature = self.spatial_alignment(self.previous_bev_feature) fused_feature = concatenate(aligned_previous_feature, current_bev_feature) else: fused_feature = current_bev_feature detections = self.head(fused_feature) self.previous_bev_feature = current_bev_feature return detections
bevdet4d = BEVDet4D() detections = bevdet4d.forward(current_image)
|
result

效果比 BEVDet 好了不少,尤其是 mAVE (速度误差)
Thought
- 没有很大的创新点,更像是
BEVDet 的一个使用 trick
BEVDet 的计算量主要分布在 image encoder 和 view transformer,所以复用上一帧的 BEV feature 即充分利用了上一帧的计算量,对当前帧引入的额外计算量也比较可控(BEV encoder 和 task head 都比较轻量)