Show and Speak: Directly Synthesize Spoken Description of Images

23 October 2020

Papers citing "Show and Speak: Directly Synthesize Spoken Description of Images"

1 / 1 papers shown

Title
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 252 927 0 24 Sep 2019