Notice

Hit 288
Subject [ICCV 2021] Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video (by Minsu Kim and Joanna Hong) is accepted in ICCV 2021
Date 2021-07-23
Title: Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Authors: Minsu Kim*, Joanna Hong*, Se Jin Park, and Yong Man Ro (*: equally contributed)
 
In this paper, we introduce a novel audio-visual multi-modal bridging framework that can utilize both audio and visual information, even with uni-modal inputs. We exploit a memory network to achieve the multi-modal bridging, where the memory network consists of two modality-specific memories: source-key and target-value memories. These two modality-specific memories save a source and a target modal representations, respectively. Then, an associative bridge is constructed between the source-key memory and the target-value memory, regarding the interrelationship between the two memories. By learning the interrelationship through the associative bridge, it is possible to access the target-value memory using source modality and source-key memory without target modality. Accordingly, the proposed framework can recall the target modal representations with source modal inputs only and provides rich information for its downstream tasks. We apply the proposed framework to two tasks: lip reading and speech reconstruction from silent video. Through the proposed associative bridge and modality-specific memories, each task knowledge is enriched with the recalled audio context, achieving state-of-the-art performance. We also verify that the associative bridge properly relates the source and target memories.