Language Models for Image Captioning: The Quirks and What Works

7 May 2015

Li Deng

Papers citing "Language Models for Image Captioning: The Quirks and What Works"

32 / 32 papers shown

Title
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model Cheng Yang Yang Sui Jinqi Xiao Lingyi Huang Yu Gong ... Jinghua Yan Y. Bai P. Sadayappan Xia Hu Bo Yuan VLM 53 0 0 24 Mar 2025
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores Chantal Shaib Joe Barrow Jiuding Sun Alexa F. Siu Byron C. Wallace A. Nenkova 66 31 0 01 Mar 2024
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models Luke Vilnis Yury Zemlyanskiy Patrick C. Murray Alexandre Passos Sumit Sanghai 54 9 0 18 Oct 2022
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention S. Tan Runpei Dong Kaisheng Ma 22 2 0 03 Nov 2021
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Pedro M. Domingos MLT 21 70 0 30 Nov 2020
MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC) C. Sur 17 16 0 15 Feb 2020
Going Beneath the Surface: Evaluating Image Captioning for Grammaticality, Truthfulness and Diversity Huiyuan Xie Tom Sherborne A. Kuhnle Ann A. Copestake DiffM 17 9 0 19 Dec 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings Gregor Wiedemann Steffen Remus Avi Chawla Chris Biemann 11 174 0 23 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Kuang-Huei Lee Hamid Palangi Xi Chen Houdong Hu Jianfeng Gao VLM 11 37 0 22 Sep 2019
Compositional Generalization in Image Captioning Mitja Nikolaus Mostafa Abdou Matthew Lamm Rahul Aralikatte Desmond Elliott CoGe 8 49 0 10 Sep 2019
MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment N. Ilinykh Sina Zarrieß David Schlangen 13 43 0 11 Jul 2019
Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity Glorianna Jagfeld Sabrina Jenne Ngoc Thang Vu AIMat 25 24 0 11 Oct 2018
Neural Aesthetic Image Reviewer Wenshan Wang Su Yang Weishan Zhang Jiulong Zhang 6 38 0 28 Feb 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning Hongge Chen Huan Zhang Pin-Yu Chen Jinfeng Yi Cho-Jui Hsieh GAN AAML 19 49 0 06 Dec 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space Liwei Wang A. Schwing Svetlana Lazebnik CoGe 16 175 0 19 Nov 2017
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding Jiahong Wu He Zheng Bo-Lu Zhao Yixin Li Baoming Yan ... Shipei Zhou G. Lin Yanwei Fu Yizhou Wang Yonggang Wang VLM 22 149 0 17 Nov 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Yang Xian Yingli Tian VLM 15 22 0 15 Sep 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images Rakshith Shetty Bernt Schiele Mario Fritz 19 223 0 30 Mar 2017
Where to put the Image in an Image Caption Generator Marc Tanti Albert Gatt K. Camilleri 39 96 0 27 Mar 2017
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 16 232 0 02 Dec 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 21 103 0 16 Nov 2016
Boosting Image Captioning with Attributes Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu Tao Mei VLM 11 620 0 05 Nov 2016
Seeing with Humans: Gaze-Assisted Neural Image Captioning Yusuke Sugano Andreas Bulling 14 68 0 18 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 11 1,883 0 29 Jul 2016
Movie Description Anna Rohrbach Atousa Torabi Marcus Rohrbach Niket Tandon C. Pal Hugo Larochelle Aaron Courville Bernt Schiele 3DV VGen 22 353 0 12 May 2016
Visual Storytelling Ting-Hao 'Kenneth' Huang Huang Francis Ferraro N. Mostafazadeh Ishan Misra ... C. L. Zitnick Devi Parikh Lucy Vanderwende Michel Galley Margaret Mitchell VGen 9 464 0 13 Apr 2016
Rich Image Captioning in the Wild Kenneth Tran Xiaodong He Lei Zhang Jian Sun Cornelia Carapcea Chris Thrasher Chris Buehler Chris Sienkiewicz VLM 9 123 0 30 Mar 2016
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Huijuan Xu Kate Saenko 22 760 0 17 Nov 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks Kyunghyun Cho Aaron Courville Yoshua Bengio 26 410 0 04 Jul 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui 24 534 0 07 May 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao W. Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 30 1,234 0 20 Dec 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 23 6,030 0 17 Nov 2014