HOME Board


Hit 382
Subject [IEEE TNNLS] Visual Object Detection with Object Sounds (Jung Uk Kim) is accepted in IEEE Transactions on Neural Networks and Learning Systems
Name 관리자
Date 2023-10-08
Title: Enabling Visual Object Detection with Object Sounds via Visual Modality Recalling Memory

Authors: Jung Uk Kim and Yong Man Ro

When humans hear sound of an object, they recall associated visual information and integrate the sounds and the recalled visual information to detect the objects. In this paper, we present a novel sound-based object detector that mimics these processes of humans. We design a Visual Modality Recalling (VMR) memory that recalls information of a visual modality, given an audio modal input (i.e., sound). To achieve this goal, we propose visual modality recalling loss and audiovisual association loss to guide VMR memory to memorize the visual modal information by establishing associations between the audio and visual modalities. With the recalled visual modal information through the VMR memory and the original audio modal input, audio-visual integration is conducted. In this step, we introduce integrated  feature contrastive loss which allows the integrated feature to be embedded as if it were encoded using both the audio and visual modalities. This guidance enables our sound-based object detector to perform robust object detection even when the only sound is provided. We believe that our work is a cornerstone study that can provide a new perspective to the conventional object detection studies that rely only on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with VMR memory.

“Note: This work was done when Dr. Jung was a PhD student at KAIST. He is now a professor at KyungHee University after completing his PhD.”