v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,580 papers shown

Semi-Supervised Panoptic Narrative GroundingACM Multimedia (ACM MM), 2023

Jiayi Ji

212

27 Oct 2023

Style-Aware Radiology Report Generation with RadGraph and Few-Shot PromptingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

263

26 Oct 2023

Cross-modal Active Complementary Learning with Self-refining CorrespondenceNeural Information Processing Systems (NeurIPS), 2023

283

26 Oct 2023

FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation Debiasing

Anant Khandelwal

451

24 Oct 2023

CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language PromptingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Lei Li

246

24 Oct 2023

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

352

19 Oct 2023

Getting aligned on representational alignment

...

320

134

18 Oct 2023

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real WorldInternational Conference on Learning Representations (ICLR), 2023

Xiaojian Ma

339

16 Oct 2023

Few-shot Action Recognition with Captioning Foundation Models

322

16 Oct 2023

Visual Question Generation in Bengali

229

12 Oct 2023

CLIP for Lightweight Semantic SegmentationChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023

Ke Jin

Wankou Yang

VLM

171

11 Oct 2023

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption GenerationInternational Conference Robotics and Computer Vision (ICRCV), 2023

156

11 Oct 2023

A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection

Yang Wang

Jiaogen Zhou

Jihong Guan

304

09 Oct 2023

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous DrivingIEEE International Conference on Robotics and Automation (ICRA), 2023

Andrew James Willmott

Danny Birch

Daniel Maund

Jamie Shotton

MLLM

380

289

03 Oct 2023

Constructing Image-Text Pair Dataset from Books

168

03 Oct 2023

Application of frozen large-scale models to multimodal task-oriented dialogue

131

02 Oct 2023

YOLOR-Based Multi-Task Learning

193

29 Sep 2023

PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

Yuxuan Liu

Zecheng Zhang

Hayden Schaeffer

208

28 Sep 2023

XVO: Generalized Visual Odometry via Cross-Modal Self-TrainingIEEE International Conference on Computer Vision (ICCV), 2023

Tohida Rehman

Ronit Mandal

Jimuyang Zhang

Debarshi Kumar Sanyal

SSL

359

28 Sep 2023

Social Media Fashion Knowledge Extraction as Captioning

176

28 Sep 2023

Attention Sorting Combats Recency Bias In Long Context Language Models

A. Peysakhovich

Adam Lerer

LRM RALM

310

28 Sep 2023

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Andrea Madotto

Tushar Nagarajan

...

Yue Liu

Babak Damavandi

Anuj Kumar

MLLM

286

110

27 Sep 2023

CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading

Wu Yuan

163

27 Sep 2023

FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images

201

24 Sep 2023

A Survey on Image-text Multimodal Models

Ruifeng Guo

Jingxuan Wei

Linzhuang Sun

Khai-Nguyen Nguyen

Guiyong Chang

Dawei Liu

Sibo Zhang

Zhengbing Yao

Mingjun Xu

Liping Bu

VLM

316

23 Sep 2023

An Empirical Study of Attention Networks for Semantic Segmentation

Zhiyan Liu

216

19 Sep 2023

R2GenGPT: Radiology Report Generation with Frozen LLMs

Lingqiao Liu

220

139

18 Sep 2023

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

112

18 Sep 2023

Holistic Geometric Feature Learning for Structured ReconstructionIEEE International Conference on Computer Vision (ICCV), 2023

183

18 Sep 2023

Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding

Fan Wang

144

15 Sep 2023

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal TokensIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jeong Hun Yeo

185

15 Sep 2023

PatFig: Generating Short and Long Captions for Patent Figures

Dana Aubakirova

Kim Gerdes

Lufei Liu

201

15 Sep 2023

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

294

14 Sep 2023

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Mykel Kochenderfer

254

12 Sep 2023

SparseSwin: Swin Transformer with Sparse Transformer Block

Krisna Pinasthika

Blessius Sheldo Putra Laksono

Riyandi Banovbi Putera Irsal

Syifa’ Hukma Shabiyya

N. Yudistira

ViT

238

11 Sep 2023

C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative GapIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

William Theisen

Walter J. Scheirer

CLIP VLM

197

06 Sep 2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Noriyuki Kojima

Hadar Averbuch-Elor

Yoav Artzi

317

06 Sep 2023

Exchanging-based Multimodal Fusion with Transformer

Xiang Li

174

05 Sep 2023

Distraction-free Embeddings for Robust VQA

194

31 Aug 2023

FIRE: Food Image to REcipe generationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

265

28 Aug 2023

Goodhart's Law Applies to NLP's Explanation BenchmarksFindings (Findings), 2023

Jennifer Hsia

Danish Pruthi

Aarti Singh

Zachary Chase Lipton

215

28 Aug 2023

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual CaptioningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Yaowei Wang

212

25 Aug 2023

PromptMRG: Diagnosis-Driven Prompts for Medical Report GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023

283

123

24 Aug 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2023

Lorenzo Baraldi

180

23 Aug 2023

CgT-GAN: CLIP-guided Text GAN for Image CaptioningACM Multimedia (ACM MM), 2023

217

23 Aug 2023

ROSGPT_Vision: Commanding Robots Using Only Language Models' PromptsFuture generations computer systems (FGCS), 2023

183

22 Aug 2023

Explore and Tell: Embodied Visual Captioning in 3D EnvironmentsIEEE International Conference on Computer Vision (ICCV), 2023

Qin Jin

189

21 Aug 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023

Minsu Kim

Jeong Hun Yeo

J. Choi

Y. Ro

208

18 Aug 2023

Learning the meanings of function words from grounded language using a visual question answering modelCognitive Sciences (CogSci), 2023

Eva Portelance

Michael C. Frank

Dan Jurafsky

NAI

271

16 Aug 2023

Visually-Aware Context Modeling for News Image CaptioningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Tingyu Qu

Tinne Tuytelaars

Marie-Francine Moens

VLM

130

16 Aug 2023