Hit 54
Subject [IEEE CSVT] Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection (by Jung Uk Kim) is accepted in IEEE Transactions on Circuits and Systems for Video Technology
Date 2021-04-21
Title: Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection
Authors: Jung Uk Kim, Sungjune Park, Yong Man Ro
Multispectral pedestrian detection has received great attention in recent years as multispectral modalities (i.e. color and thermal) can provide complementary visual information. However, there are major inherent issues in multispectral pedestrian detection. First, the cameras of the two modalities have different field-of-views (FoVs), so that image pairs are often miscalibrated. Second, modality discrepancy is observed, because image pairs are captured at different wavelengths. In this paper, to alleviate these issues, we propose a new uncertainty-aware multispectral pedestrian detection framework. In our framework, we consider two types of uncertainties: (1) Region of Interest (RoI) uncertainty and (2) predictive uncertainty. For the miscalibration issue, we propose RoI uncertainty which represents the reliability of the RoI candidates. With the RoI uncertainty, when combining two modal features, we devise uncertainty-aware feature fusion (UFF) module to reduce the effect of RoI features with high RoI uncertainty. We also propose uncertainty-aware cross-modal guiding (UCG) module for the modality discrepancy. In the UCG module, we use the predictive uncertainty, which indicates how reliable the prediction of the RoI feature is. Based on the predictive uncertainty, the UCG module guides the feature distribution of high predictive uncertain (less reliable) modality to resemble that of low predictive uncertain (more reliable) modality. The UCG module can encode more discriminative features by guiding feature distributions of two modalities to be similar. With comprehensive experiments on the public multispectral datasets, we verified that our method reduces the effect of the miscalibration and alleviates the modality discrepancy, outperforming existing state-of-the-art methods.