International Conferences

Date December 2023
Title Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Authors Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, and Yong Man Ro
Publisher IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
URL