v1v2v3v4 (latest)

Connecting Vision and Language with Localized Narratives

European Conference on Computer Vision (ECCV), 2019

6 December 2019

Papers citing "Connecting Vision and Language with Localized Narratives"

50 / 200 papers shown

Pre-training image-language transformers for open-vocabulary tasks

176

09 Sep 2022

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

Louis-Philippe Morency

278

17 Aug 2022

Layout-Bridging Text-to-Image Synthesis

163

12 Aug 2022

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative GroundingACM Multimedia (ACM MM), 2022

Junshi Huang

198

11 Aug 2022

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and SketchEuropean Conference on Computer Vision (ECCV), 2022

Diyi Yang

181

05 Aug 2022

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Taehyeong Kim

H. Song

Byoung-Tak Zhang

202

31 Jul 2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

...

644

1,359

22 Jun 2022

Crossmodal-3600: A Massively Multilingual Multimodal Evaluation DatasetConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

575

106

25 May 2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

279

02 May 2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Dan Oneaţă

H. Cucu

118

27 Apr 2022

SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and TextComputer Vision and Pattern Recognition (CVPR), 2022

Pinaki Nath Chowdhury

397

25 Apr 2022

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data CollectionComputer Vision and Pattern Recognition (CVPR), 2022

127

15 Apr 2022

X-DETR: A Versatile Architecture for Instance-wise Vision-Language TasksEuropean Conference on Computer Vision (ECCV), 2022

141

12 Apr 2022

Winoground: Probing Vision and Language Models for Visio-Linguistic CompositionalityComputer Vision and Pattern Recognition (CVPR), 2022

Amanpreet Singh

Douwe Kiela

374

521

07 Apr 2022

KNN-Diffusion: Image Generation via Large-Scale RetrievalInternational Conference on Learning Representations (ICLR), 2022

238

147

06 Apr 2022

DT2I: Dense Text-to-Image Generation from Region DescriptionsInternational Conference on Artificial Neural Networks (ICANN), 2022

159

05 Apr 2022

Keyword localisation in untranscribed speech using visually grounded speech modelsIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Kayode Olaleye

Dan Oneaţă

Herman Kamper

193

02 Feb 2022

Deep Learning Approaches on Image Captioning: A ReviewACM Computing Surveys (ACM CSUR), 2022

480

150

31 Jan 2022

Scaling Open-Vocabulary Image Segmentation with Image-Level LabelsEuropean Conference on Computer Vision (ECCV), 2021

444

494

22 Dec 2021

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

259

109

09 Dec 2021

FLAVA: A Foundational Language And Vision Alignment Model

Amanpreet Singh

Douwe Kiela

355

863

08 Dec 2021

Object-Centric Unsupervised Image Captioning

Ser-Nam Lim

194

02 Dec 2021

LAFITE: Towards Language-Free Training for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2021

Jiuxiang Gu

293

204

27 Nov 2021

Less is More: Generating Grounded Navigation Instructions from Landmarks

433

25 Nov 2021

Integrating Visuospatial, Linguistic and Commonsense Structure into Story VisualizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

A. Maharana

Joey Tianyi Zhou

252

21 Oct 2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

130

14 Oct 2021

What Vision-Language Models `See' when they See Scenes

259

15 Sep 2021

Panoptic Narrative GroundingIEEE International Conference on Computer Vision (ICCV), 2021

248

10 Sep 2021

LocTex: Learning Data-Efficient Visual Representations from Localized Textual SupervisionIEEE International Conference on Computer Vision (ICCV), 2021

Zhijian Liu

Simon Stent

Jie Li

John Gideon

Song Han

VLM

188

26 Aug 2021

From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Lorenzo Baraldi

435

344

14 Jul 2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Jing Liu

...

292

01 Jul 2021

A Picture May Be Worth a Hundred Words for Visual Question Answering

145

25 Jun 2021

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

128

18 Jun 2021

Connecting What to Say With Where to Look by Modeling Human Attention TracesComputer Vision and Pattern Recognition (CVPR), 2021

Babak Damavandi

261

12 May 2021

Concadia: Towards Image-Based Text Generation with a PurposeConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

227

16 Apr 2021

Talk, Don't Write: A Study of Direct Speech-Based Image RetrievalInterspeech (Interspeech), 2021

191

05 Apr 2021

PanGEA: The Panoramic Graph Environment Annotation Toolkit

164

23 Mar 2021

Human-like Controllable Image Captioning with Verb-specific Semantic RolesComputer Vision and Pattern Recognition (CVPR), 2021

Long Chen

Zhihong Jiang

Jun Xiao

Wei Liu

252

22 Mar 2021

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual ConceptsComputer Vision and Pattern Recognition (CVPR), 2021

1.1K

1,360

17 Feb 2021

Telling the What while Pointing to the Where: Multimodal Queries for Image RetrievalIEEE International Conference on Computer Vision (ICCV), 2021

197

09 Feb 2021

Decoupling the Role of Data, Attention, and Losses in Multimodal TransformersTransactions of the Association for Computational Linguistics (TACL), 2021

Lisa Anne Hendricks

John F. J. Mellor

R. Schneider

Jean-Baptiste Alayrac

Aida Nematzadeh

234

126

31 Jan 2021

Adversarial Text-to-Image Synthesis: A ReviewNeural Networks (NN), 2021

321

201

25 Jan 2021

ArtEmis: Affective Language for Visual ArtComputer Vision and Pattern Recognition (CVPR), 2021

133

152

19 Jan 2021

Cross-Modal Contrastive Learning for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2021

512

417

12 Jan 2021

StacMR: Scene-Text Aware Cross-Modal Retrieval

Andrés Mafla

Rafael Sampaio de Rezende

194

08 Dec 2020

Understanding Guided Image Captioning Performance across DomainsConference on Computational Natural Language Learning (CoNLL), 2020

369

04 Dec 2020

Text-to-Image Generation Grounded by Fine-Grained User Attention

260

07 Nov 2020

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

233

416

15 Oct 2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Hao Tan

Joey Tianyi Zhou

CLIP

200

129

14 Oct 2020

Fine-Grained Grounding for Multimodal Speech RecognitionFindings (Findings), 2020

161

05 Oct 2020