URL
TL;DR
BEVDet4D
是基于 BEVDet
加入了时序信息的一篇论文
具体来说就是将上一帧的 BEV Feature
和本帧的 BEV Feature
对齐后 Concat
到一起送入 BEV Encoder
中进行 BEV
视角下的 3D
目标检测
BEVDet
论文中的 image encoder + view transformer
完全保持不变
由于有两帧的信息,所以对速度的预测相较于单帧有较大提升
Algorithm
整体流程
![bevdet4d.png](https://s2.loli.net/2023/08/06/XwCSmtKiFWJEzQk.png)
在 BEV Feature
层面(View Transformer
的输出)融合两帧信息
算法的伪代码表示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| class BEVDet4D: def __init__(self): # 初始化相关的编码器、转换器和其他必要的组件 self.image_view_encoder = ImageViewEncoder() self.view_transformer = ViewTransformer() self.bev_encoder = BEVEncoder() self.head = DetectionHead() self.previous_bev_feature = None
def spatial_alignment(self, feature): # 这里执行空间对齐操作,具体细节可能需要根据原始论文进行补充 # 实际代码中这里似乎没有开,即上一帧的 BEV feature 直接和本帧 BEV feature map concat aligned_feature = ... return aligned_feature
def forward(self, current_image): # 使用图像视图编码器和视图转换器处理当前图像 image_feature = self.image_view_encoder(current_image) transformed_feature = self.view_transformer(image_feature)
# 使用BEV编码器获取当前帧的BEV特征 current_bev_feature = self.bev_encoder(transformed_feature)
# 如果存在前一帧的BEV特征,则进行空间对齐和融合 if self.previous_bev_feature: aligned_previous_feature = self.spatial_alignment(self.previous_bev_feature) fused_feature = concatenate(aligned_previous_feature, current_bev_feature) else: fused_feature = current_bev_feature
# 使用检测头部进行3D物体检测 detections = self.head(fused_feature)
# 保存当前帧的BEV特征以供下一帧使用 self.previous_bev_feature = current_bev_feature
return detections
# 实例化BEVDet4D并进行前向传递 bevdet4d = BEVDet4D() detections = bevdet4d.forward(current_image)
|
result
![BEVdet4D2.png](https://s2.loli.net/2023/08/06/TgX7qpHDZUjsEGn.png)
效果比 BEVDet
好了不少,尤其是 mAVE
(速度误差)
Thought