Connecting What to Say With Where to Look by Modeling Human Attention Traces

12 May 2021

Babak Damavandi

Papers citing "Connecting What to Say With Where to Look by Modeling Human Attention Traces"

2 / 2 papers shown

Title
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 53 254 0 14 Jul 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 250 927 0 24 Sep 2019