A Keypoint-based Global Association Network for Lane Detection

URL

paper: https://arxiv.org/pdf/2204.07335.pdf
code: https://github.com/Wolfwjs/GANet

TL;DR

传统基于关键点检测的车道线检测网络通常需要根据任务的定义设计模型输出头结构，比如 UFLD 算法每条车道线需要占据一个输出 channel 。
本论文提出一种新颖的车道线检测范式，解耦了任务和网络，仅使用固定的输出通道数即可检测任意多条车道线。

Algorithm

总体思路

如何实现只用一个 channel 输出即可检测任意多条车道线？一个可以想到的简单方法是：
1. 输出一个 channel = 1 的 heat map，其中每个位置的 heat map value 依旧表示该位置是车道线关键点的概率。
2. 使用阈值过滤得到所有车道线的关键点坐标（每条车道线关键点数量应 >= 2）。
3. 使用聚类的方式将一堆关键点分成若干组，每组关键点表示一条车道线。聚类中的 “距离” 判定逻辑需要根据一些先验知识进行设计（比如：因为车道线在图片中更接近竖直方向，因此水平距离权重小于垂直距离）。
上述操作中的聚类过程包含了很多人为总结的先验知识，非常不 “机器学习”，于是我们可以改进这个过程：
1. 输出一个 channel = 1 的 heat map 和一个 channel = 2 的 offset map，其中每个位置的 heat map value 依旧表示该位置是车道线关键点的概率，每个位置的 offset map value 表示 如果该点是关键点，那么该点相对于本条车道线起始点的偏移（x, y），起始点表示一条车道线关键点中最靠近图片下方的点。
2. 使用阈值过滤得到所有关键点，将关键点对应位置的偏移 offset 得到新的点 p'，理论上同一条车道线的每一个关键点偏移后得到的新位置 p' 都一致，都是车道线起始点。
3. 因为神经网络不太能拟合到 loss == 0，因此还需要对所有的 p' 进行聚类，每个类表示一条车道线实例，本质相当于将聚类中的距离判定逻辑吸到神经网络内部。
以上就是本论文提出的 GANet 算法的核心思想：用固定的有限个输出通道去预测任意条车道线。

网络结构

LFA（Lane-aware Feature Aggregator）

动机

传统的2D卷积在固定的网格状区域内采样特征，这不适用于处理车道线的狭长形状。
因此作者使用如下步骤改进各关键点上的局部特征：
1. 预测该关键点同一条车道线紧邻的 num_adjacent_keypoints 个点的 offset，进行显式监督。
2. 用预测的 offset 引导可变形卷积改进关键点局部特征。

代码实现

import torch
import torch.nn as nn
from torchvision.ops import deform_conv2d
import math
class LFA(nn.Module):
    def __init__(self, in_channels, out_channels, num_adjacent_keypoints=5):
        super(LFA, self).__init__()
        self.offset_conv = nn.Conv2d(in_channels, 2 * num_adjacent_keypoints, kernel_size=1)
        self.deform_conv_weight = nn.Parameter(torch.Tensor(out_channels, in_channels, 3, 3))
        nn.init.kaiming_uniform_(self.deform_conv_weight, a=math.sqrt(5))
    def forward(self, x):
        # x: input feature map with shape [batch_size, in_channels, height, width]
        
        # Step 1: Predict the offsets for adjacent keypoints
        offsets = self.offset_conv(x)  # shape: [batch_size, 2 * num_adjacent_keypoints, height, width]
        
        # Step 2: Perform deformable convolution to aggregate features
        aggregated_features = deform_conv2d(x, offsets, self.deform_conv_weight, stride=1, padding=1)
        
        return aggregated_features, offsets  # Return both aggregated features and offsets
# Initialize LFA module
lfa = LFA(in_channels=64, out_channels=128, num_adjacent_keypoints=5)
# Dummy input feature map with shape [batch_size, in_channels, height, width]
input_feature_map = torch.randn(16, 64, 128, 128)
# Forward pass
aggregated_features, predicted_offsets = lfa(input_feature_map)
# aggregated_features 是特征输出
# predicted_offsets 是预测的相邻的 num_adjacent_keypoints 个车道线的偏移量，显式监督

Thought

提出了一种灵活的车道线检测范式，具有较好的可拓展性，目前在车道线榜单上比较靠前。
另外一个创新点是LFA module，主要是用到了可形变卷积，对于大多数动态性较差的端侧芯片来说，不容易部署。