Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1504.00325
Cited By
v1
v2 (latest)
Microsoft COCO Captions: Data Collection and Evaluation Server
1 April 2015
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Microsoft COCO Captions: Data Collection and Evaluation Server"
50 / 1,515 papers shown
Title
Generic Attention-model Explainability by Weighted Relevance Accumulation
ACM Multimedia Asia (MA), 2023
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
110
4
0
20 Aug 2023
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control
IEEE International Conference on Computer Vision (ICCV), 2023
Zi-Yuan Hu
Yanyang Li
Michael R. Lyu
Liwei Wang
VLM
149
23
0
18 Aug 2023
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
IEEE International Conference on Computer Vision (ICCV), 2023
Hangjie Yuan
Shiwei Zhang
Xiang Wang
Samuel Albanie
Yining Pan
Tao Feng
Jianwen Jiang
Dong Ni
Yingya Zhang
Deli Zhao
VLM
195
59
0
18 Aug 2023
Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage
Dario Cioni
Lorenzo Berlincioni
Federico Becattini
Marco Bertini
DiffM
112
13
0
14 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Neural Information Processing Systems (NeurIPS), 2023
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
123
21
0
11 Aug 2023
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination
ACM Multimedia (ACM MM), 2023
Haoxuan Li
Yi Bin
Junrong Liao
Yang Yang
Heng Tao Shen
161
42
0
08 Aug 2023
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
ACM Multimedia (ACM MM), 2023
Yi Bin
Haoxuan Li
Yahui Xu
Xing Xu
Yang Yang
Heng Tao Shen
VOS
133
28
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
IEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Wenqi Shao
Yutao Hu
Shiyang Feng
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Jiaming Song
Yuning Qiao
Ping Luo
VLM
MLLM
187
24
0
07 Aug 2023
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
European Conference on Computer Vision (ECCV), 2023
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
305
25
0
06 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
International Conference on Machine Learning (ICML), 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
378
989
0
04 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Neural Information Processing Systems (NeurIPS), 2023
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
225
197
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
International Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
217
115
0
03 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Computer Vision and Image Understanding (CVIU), 2023
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
220
9
0
02 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
308
523
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
ACM Multimedia (ACM MM), 2023
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
130
18
0
02 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
141
60
0
31 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
ACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
122
1
0
27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
559
44
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
380
149
0
25 Jul 2023
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework
Jingxuan Wei
Cheng Tan
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Bihui Yu
R. Guo
Stan Z. Li
LRM
295
16
0
24 Jul 2023
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Knowledge Discovery and Data Mining (KDD), 2023
Xinyue Hu
Lin Gu
Qi A. An
Mengliang Zhang
Liangchen Liu
Kazuma Kobayashi
Tatsuya Harada
Ronald M. Summers
Yingying Zhu
MedIm
177
49
0
22 Jul 2023
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
IEEE International Conference on Computer Vision (ICCV), 2023
Runjia Li
Shuyang Sun
Mohamed Elhoseiny
Juil Sock
163
16
0
21 Jul 2023
Divide & Bind Your Attention for Improved Generative Semantic Nursing
British Machine Vision Conference (BMVC), 2023
Yumeng Li
Margret Keuper
Dan Zhang
Anna Khoreva
DiffM
237
74
0
20 Jul 2023
Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap
Dejia Xu
Xingqian Xu
Wenyan Cong
Humphrey Shi
Zinan Lin
DiffM
132
4
0
20 Jul 2023
Improving Multimodal Datasets with Image Captioning
Neural Information Processing Systems (NeurIPS), 2023
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
187
121
0
19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
IEEE transactions on multimedia (IEEE TMM), 2023
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
139
18
0
19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjD
VLM
407
63
0
18 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
195
7
0
15 Jul 2023
Gloss Attention for Gloss-free Sign Language Translation
Computer Vision and Pattern Recognition (CVPR), 2023
Aoxiong Yin
Tianyun Zhong
Lilian H. Y. Tang
Weike Jin
Tao Jin
Zhou Zhao
SLR
171
60
0
14 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
European Conference on Computer Vision (ECCV), 2023
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Yuan Liu
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
480
1,578
0
12 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
281
154
0
11 Jul 2023
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Feng Li
Hao Zhang
Pei Sun
Xueyan Zou
Siyi Liu
Jianwei Yang
Chun-yue Li
Lei Zhang
Jianfeng Gao
VLM
221
215
0
10 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
689
304
0
07 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
138
6
0
06 Jul 2023
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
International Conference on Learning Representations (ICLR), 2023
Pratyush Maini
Sachin Goyal
Zachary Chase Lipton
J. Zico Kolter
Aditi Raghunathan
VLM
127
40
0
06 Jul 2023
On the Cultural Gap in Text-to-Image Generation
European Conference on Artificial Intelligence (ECAI), 2023
Bingshuai Liu
Longyue Wang
Chenyang Lyu
Yong Zhang
Jinsong Su
Shuming Shi
Zhaopeng Tu
VLM
EGVM
115
12
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2023
Saurabh Pahune
Manoj Chandrasekharan
AILaw
142
26
0
05 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
230
88
0
05 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
172
5
0
05 Jul 2023
Visual Instruction Tuning with Polite Flamingo
AAAI Conference on Artificial Intelligence (AAAI), 2023
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
324
52
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Neural Information Processing Systems (NeurIPS), 2023
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Jiaming Song
280
160
0
03 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
263
4
0
03 Jul 2023
A Massive Scale Semantic Similarity Dataset of Historical English
Neural Information Processing Systems (NeurIPS), 2023
Emily Silcock
Melissa Dell
159
5
0
30 Jun 2023
CLIPAG: Towards Generator-Free Text-to-Image Generation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Roy Ganz
Michael Elad
VLM
178
14
0
29 Jun 2023
Towards Open Vocabulary Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
326
210
0
28 Jun 2023
Semi-supervised Multimodal Representation Learning through a Global Workspace
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Benjamin Devillers
Léopold Maytié
R. V. Rullen
SSL
118
10
0
27 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Weihao Ye
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
107
2
0
27 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
374
794
0
27 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
140
9
0
25 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Chunjiang Ge
Yulei Qin
Mengdan Zhang
...
Xing Sun
Zhenyu Qiu
Rongrong Ji
Caifeng Shan
Ran He
ELM
MLLM
645
1,170
0
23 Jun 2023
Previous
1
2
3
...
12
13
14
...
29
30
31
Next