TIGEr: Text-to-Image Grounding for Image Caption Evaluation

TIGEr: Text-to-Image Grounding for Image Caption Evaluation

4 September 2019

Lei Zhang

Pengchuan Zhang

Papers citing "TIGEr: Text-to-Image Grounding for Image Caption Evaluation"

12 / 12 papers shown

Title
Neptune: The Long Orbit to Benchmarking Long Video Understanding Arsha Nagrani Mingda Zhang Ramin Mehran Rachel Hornung N. B. Gundavarapu ... Boqing Gong Cordelia Schmid Mikhail Sirotenko Yukun Zhu Tobias Weyand 100 4 0 12 Dec 2024
4-LEGS: 4D Language Embedded Gaussian Splatting Gal Fiebelman Tamir Cohen Ayellet Morgenstern Peter Hedman Hadar Averbuch-Elor 3DGS 36 1 0 14 Oct 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Yuiga Wada Kanta Kaneda Daichi Saito Komei Sugiura 29 24 0 28 Feb 2024
See, Say, and Segment: Teaching LMMs to Overcome False Premises Tsung-Han Wu Giscard Biamby David M. Chan Lisa Dunlap Ritwik Gupta Xudong Wang Joseph E. Gonzalez Trevor Darrell VLM MLLM 30 18 0 13 Dec 2023
The Challenges of Image Generation Models in Generating Multi-Component Images Tham Yik Foong Shashank Kotyan Poyuan Mao Danilo Vasconcellos Vargas EGVM 34 1 0 22 Nov 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training J. Hu Roberto Cavicchioli Alessandro Capotondi 19 6 0 20 May 2023
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models Jin-Hwa Kim Yunji Kim Jiyoung Lee Kang Min Yoo Sang-Woo Lee EGVM 19 32 0 25 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics David M. Chan Austin Myers Sudheendra Vijayanarasimhan David A. Ross Bryan Seybold John F. Canny 23 6 0 12 May 2022
Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang Jianfeng Wang Xiaowei Hu Lin Liang Zhe Gan Lijuan Wang Yezhou Yang Zicheng Liu ViT VLM 19 85 0 09 Dec 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Yaya Shi Xu Yang Haiyang Xu Chunfen Yuan Bing Li Weiming Hu Zhengjun Zha 31 33 0 17 Nov 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 53 244 0 14 Jul 2021
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 144 1,458 0 06 Jun 2016