Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1504.00325
Cited By
v1
v2 (latest)
Microsoft COCO Captions: Data Collection and Evaluation Server
1 April 2015
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Microsoft COCO Captions: Data Collection and Evaluation Server"
50 / 1,515 papers shown
Title
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
116
5
0
08 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Neural Information Processing Systems (NeurIPS), 2023
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
306
62
0
07 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Neural Information Processing Systems (NeurIPS), 2023
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Huayu Chen
Tuo Zhao
Boqing Gong
Tianyi Zhou
158
12
0
06 Oct 2023
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
International Conference on Human Factors in Computing Systems (CHI), 2023
Brett A. Halperin
S. Lukin
CoGe
165
29
0
06 Oct 2023
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
VLM
229
19
0
04 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
ACM Multimedia (ACM MM), 2023
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
226
15
0
04 Oct 2023
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
156
4
0
03 Oct 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
International Conference on Learning Representations (ICLR), 2023
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIP
VLM
205
98
0
02 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
252
5
0
02 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
International Conference on Learning Representations (ICLR), 2023
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
141
174
0
02 Oct 2023
Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP
International Conference on Learning Representations (ICLR), 2023
Zixiang Chen
Yihe Deng
Yuanzhi Li
Quanquan Gu
VLM
301
16
0
02 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
European Conference on Computer Vision (ECCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
198
6
0
29 Sep 2023
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
International Conference on Learning Representations (ICLR), 2023
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
219
275
0
29 Sep 2023
YOLOR-Based Multi-Task Learning
Hung-Shuo Chang
Chien-Yao Wang
Hang Yan
Yukun Zhu
Hongpeng Liao
MoE
VLM
157
20
0
29 Sep 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
161
36
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Yuan Liu
MLLM
610
299
0
26 Sep 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
Neural Information Processing Systems (NeurIPS), 2023
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
168
15
0
26 Sep 2023
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
International Conference on Learning Representations (ICLR), 2023
Haoning Wu
Zicheng Zhang
Erli Zhang
Chaofeng Chen
Liang Liao
...
Chunyi Li
Wenxiu Sun
Qiong Yan
Guangtao Zhai
Weisi Lin
VLM
296
215
0
25 Sep 2023
Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment
British Machine Vision Conference (BMVC), 2023
Sina Malakouti
Adriana Kovashka
ObjD
149
2
0
24 Sep 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
252
20
0
23 Sep 2023
Detect Everything with Few Examples
Conference on Robot Learning (CoRL), 2023
Xinyu Zhang
Yuting Wang
Abdeslam Boularias
ObjD
VLM
269
22
0
22 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
168
10
0
21 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
112
3
0
15 Sep 2023
Looking at words and points with attention: a benchmark for text-to-shape coherence
Andrea Amaduzzi
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
116
3
0
14 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
253
22
0
14 Sep 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
124
4
0
14 Sep 2023
ITI-GEN: Inclusive Text-to-Image Generation
IEEE International Conference on Computer Vision (ICCV), 2023
Cheng Zhang
Xuanbai Chen
Siqi Chai
Chen Henry Wu
Dmitry Lagun
Thabo Beeler
Fernando de la Torre
VLM
207
75
0
11 Sep 2023
Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval
IEEE Transactions on Image Processing (IEEE TIP), 2023
Yabing Wang
Shuhui Wang
Hao Luo
Jianfeng Dong
F. Wang
Meng Han
Xun Wang
Meng Wang
170
13
0
11 Sep 2023
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
International Conference on Learning Representations (ICLR), 2023
Zhengxiang Shi
Aldo Lipani
VLM
403
41
0
11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Shiyang Feng
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
232
148
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
IEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
139
3
0
07 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
191
160
0
05 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
149
11
0
05 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
192
8
0
05 Sep 2023
Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation
Asian Conference on Computer Vision (ACCV), 2023
Ryota Yoshihashi
Yuya Otsuka
Kenji Doi
Tomohiro Tanaka
Hirokatsu Kataoka
370
4
0
04 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
262
35
0
02 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Shiyang Feng
Xianzhi Li
Jiaming Song
Pheng-Ann Heng
MLLM
312
184
0
01 Sep 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
International Conference on Learning Representations (ICLR), 2023
Ashmit Khandelwal
Aditya Agrawal
Aanisha Bhattacharyya
Yaman Kumar Singla
Somesh Singh
...
Ishita Dasgupta
Stefano Petrangeli
R. Shah
Changyou Chen
Balaji Krishnamurthy
275
10
0
01 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Joshua Forster Feinglass
Yezhou Yang
132
2
0
01 Sep 2023
TouchStone: Evaluating Vision-Language Models by Language Models
Shuai Bai
Shusheng Yang
Jinze Bai
Peng Wang
Xing Zhang
Junyang Lin
Xinggang Wang
Chang Zhou
Jingren Zhou
MLLM
226
56
0
31 Aug 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
IEEE Transactions on Image Processing (IEEE TIP), 2023
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
169
8
0
30 Aug 2023
Evaluation and Analysis of Hallucination in Large Vision-Language Models
Junyan Wang
Yi Zhou
Guohai Xu
Pengcheng Shi
Chenlin Zhao
...
Mingshi Yan
Ji Zhang
Jihua Zhu
Jitao Sang
Haoyu Tang
MLLM
212
90
0
29 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
Lin Geng Foo
Hossein Rahmani
Jing Liu
636
42
0
27 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
188
18
0
25 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
417
1,488
0
24 Aug 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ziyan Yang
Kushal Kafle
Zhe Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
188
1
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
71
6
0
24 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
IEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
182
22
0
23 Aug 2023
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
IEEE International Conference on Computer Vision (ICCV), 2023
Chongyan Chen
Samreen Anjum
Danna Gurari
202
15
0
21 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
IEEE International Conference on Computer Vision (ICCV), 2023
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
157
3
0
21 Aug 2023
Previous
1
2
3
...
11
12
13
...
29
30
31
Next