Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1912.03098
Cited By
v1
v2
v3
v4 (latest)
Connecting Vision and Language with Localized Narratives
European Conference on Computer Vision (ECCV), 2019
6 December 2019
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Connecting Vision and Language with Localized Narratives"
50 / 199 papers shown
Title
Taming Self-Training for Open-Vocabulary Object Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
S. Schulter
Long Zhao
Zhixing Zhang
Vijay Kumar B.G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
VLM
ObjD
318
21
0
11 Aug 2023
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
Chinmay Hegde
OOD
221
2
0
07 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Neural Information Processing Systems (NeurIPS), 2023
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
273
197
0
04 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
IEEE International Conference on Computer Vision (ICCV), 2023
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
289
19
0
31 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
408
151
0
25 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
266
88
0
05 Jul 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
270
14
0
28 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Neural Information Processing Systems (NeurIPS), 2023
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
664
191
0
20 Jun 2023
Estimating Generic 3D Room Structures from 2D Annotations
Neural Information Processing Systems (NeurIPS), 2023
D. Rozumnyi
S. Popov
Kevis-Kokitsi Maninis
Matthias Nießner
V. Ferrari
3DV
3DPC
223
8
0
15 Jun 2023
Vocabulary-free Image Classification
Neural Information Processing Systems (NeurIPS), 2023
Alessandro Conti
Enrico Fini
Goran Frehse
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
431
32
0
01 Jun 2023
Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias
Dominic Rampas
Mats L. Richter
Christopher Pal
Marc Aubreville
DiffM
VLM
197
49
0
01 Jun 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
247
0
0
31 May 2023
Translation-Enhanced Multilingual Text-to-Image Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yaoyiran Li
Ching-Yun Chang
Stephen Rawls
Ivan Vulić
Anna Korhonen
179
12
0
30 May 2023
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
IEEE Robotics and Automation Letters (RA-L), 2023
Hongkuan Zhou
Zhenshan Bing
Xiangtong Yao
Xiaojie Su
Chenguang Yang
Kai-Qi Huang
Alois C. Knoll
LM&Ro
229
25
0
30 May 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siqi Liu
Weixi Feng
Tsu-Jui Fu
Wenhu Chen
Wenjie Wang
VLM
290
21
0
23 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
157
5
0
23 May 2023
Consensus and Subjectivity of Skin Tone Annotation for ML Fairness
Neural Information Processing Systems (NeurIPS), 2023
Candice Schumann
Gbolahan O. Olanubi
Auriel Wright
Ellis P. Monk
Courtney Heldreth
Susanna Ricco
261
32
0
16 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
418
124
0
04 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
213
2
0
03 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
International Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
424
2,662
0
20 Apr 2023
Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
IEEE International Conference on Computer Vision (ICCV), 2023
Cong Han
Yujie Zhong
Dengjie Li
Kai Han
Lin Ma
VLM
SSeg
227
42
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
475
967
0
03 Apr 2023
Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts
IEEE International Conference on Computer Vision (ICCV), 2023
Dongyoon Han
Junsuk Choe
Dante Chun
John Joon Young Chung
Minsuk Chang
Sangdoo Yun
Jean Y. Song
Seong Joon Oh
OOD
1.3K
4
1
30 Mar 2023
Variational Distribution Learning for Unsupervised Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Minsoo Kang
Doyup Lee
Jiseob Kim
Saehoon Kim
Bohyung Han
DRL
OOD
178
4
0
28 Mar 2023
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Seokju Cho
Heeseong Shin
Sung‐Jin Hong
Anurag Arnab
Paul Hongsuck Seo
Seung Wook Kim
VLM
280
176
0
21 Mar 2023
Open-vocabulary Panoptic Segmentation with Embedding Modulation
IEEE International Conference on Computer Vision (ICCV), 2023
Xi Chen
Shuang Li
Ser-Nam Lim
Antonio Torralba
Hengshuang Zhao
VLM
163
38
0
20 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
212
122
0
12 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image Captioning
International Journal of Computer Vision (IJCV), 2023
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
164
11
0
11 Mar 2023
Connecting Vision and Language with Video Localized Narratives
Computer Vision and Pattern Recognition (CVPR), 2023
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
274
30
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
404
268
0
20 Feb 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLM
ObjD
147
55
0
23 Jan 2023
Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation
S. D. Dao
Hengcan Shi
Dinh Q. Phung
Jianfei Cai
VLM
108
1
0
18 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
163
2
0
17 Jan 2023
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
306
100
0
05 Jan 2023
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Jishnu Mukhoti
Tsung-Yu Lin
Omid Poursaeed
Rui Wang
Ashish Shah
Juil Sock
Ser-Nam Lim
VLM
231
116
0
09 Dec 2022
Who are you referring to? Coreference resolution in image narrations
IEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
246
5
0
26 Nov 2022
Shifted Diffusion for Text-to-image Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiuxiang Gu
Bingchen Liu
Yizhe Zhu
Xiao Yang
Changyou Chen
Jinhui Xu
DiffM
283
57
0
24 Nov 2022
ReCo: Region-Controlled Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2022
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Linjie Li
Kevin Qinghong Lin
...
Nan Duan
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
DiffM
223
187
0
23 Nov 2022
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Vitali Petsiuk
Alexander E. Siemenn
Saisamrit Surbehera
Zad Chin
Keith Tyser
...
Ori Kerret
Tonio Buonassisi
Kate Saenko
Armando Solar-Lezama
Iddo Drori
VLM
101
45
0
22 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
191
9
0
15 Nov 2022
Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions
Michele Cafagna
Kees van Deemter
Albert Gatt
CoGe
140
4
0
09 Nov 2022
From colouring-in to pointillism: revisiting semantic segmentation supervision
Rodrigo Benenson
V. Ferrari
VLM
190
25
0
25 Oct 2022
Lafite2: Few-shot Text-to-Image Generation
Jiuxiang Gu
Chunyuan Li
Changyou Chen
Jianfeng Gao
Jinhui Xu
DiffM
172
14
0
25 Oct 2022
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yu Zhao
Jianguo Wei
Zhichao Lin
Yueheng Sun
Meishan Zhang
Hao Fei
141
17
0
20 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Neural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
764
4,476
0
16 Oct 2022
Caption supervision enables robust learners
Ben Feuer
Ameya Joshi
Chinmay Hegde
SSL
CLIP
VLM
182
3
0
13 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Computer Vision and Pattern Recognition (CVPR), 2022
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
149
24
0
04 Oct 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
Computer Vision and Pattern Recognition (CVPR), 2022
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
VLM
168
120
0
30 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
225
8
0
12 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLM
ViT
172
12
0
09 Sep 2022
Previous
1
2
3
4
Next