Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Home
Papers
1504.00325
Cited By
v1
v2 (latest)
Microsoft COCO Captions: Data Collection and Evaluation Server
1 April 2015
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Microsoft COCO Captions: Data Collection and Evaluation Server"
50 / 1,519 papers shown
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zichen Miao
Wei Chen
Qiang Qiu
271
7
0
24 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
266
1
0
22 Mar 2025
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zenghui Yuan
Jiawen Shi
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
AAML
451
8
0
20 Mar 2025
UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation
Yaxiong Chen
Chuang Du
Chunlei Li
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
MedIm
295
1
0
20 Mar 2025
Deeply Supervised Flow-Based Generative Models
Inkyu Shin
Chenglin Yang
Liang-Chieh Chen
444
6
0
18 Mar 2025
Can Large Vision Language Models Read Maps Like a Human?
Shuo Xing
Zezhou Sun
Shuangyu Xie
Kaiyuan Chen
Yanjia Huang
Yuping Wang
Jiachen Li
Dezhen Song
Zhengzhong Tu
372
20
0
18 Mar 2025
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
383
0
0
17 Mar 2025
Scale Efficient Training for Large Datasets
Computer Vision and Pattern Recognition (CVPR), 2025
Qing Zhou
Junyu Gao
Qi Wang
DD
324
3
0
17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
292
5
0
15 Mar 2025
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang
Qixiang Zhang
Lehan Wang
Xuanqi Huang
Xiaomeng Li
VOS
VGen
296
3
0
14 Mar 2025
RONA: Pragmatically Diverse Image Captioning with Coherence Relations
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
309
3
0
14 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Lei Ma
Mamba
303
19
0
14 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
International Conference on Learning Representations (ICLR), 2025
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
1.0K
4
0
14 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
524
12
0
13 Mar 2025
Teaching LMMs for Image Quality Scoring and Interpreting
Zicheng Zhang
H. Wu
Ziheng Jia
Weisi Lin
Guoquan Zheng
410
8
0
12 Mar 2025
Scaling Laws for Conditional Emergence of Multilingual Image Captioning via Generalization from Translation
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
563
0
0
12 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
283
0
0
11 Mar 2025
LongProLIP: A Probabilistic Vision-Language Model with Long Context Text
Sanghyuk Chun
Sangdoo Yun
VLM
304
2
0
11 Mar 2025
Stick to Facts: Towards Fidelity-oriented Product Description Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Zhangming Chan
Preslav Nakov
Yongliang Wang
Jia-Nan Li
Qing Cui
Kun Gai
Dongyan Zhao
Rui Yan
324
25
0
11 Mar 2025
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
Rahul Nair
Bhanu Tokas
Neel Shah
391
0
0
10 Mar 2025
Task-Agnostic Attacks Against Vision Foundation Models
Brian Pulfer
Yury Belousov
Vitaliy Kinakh
Teddy Furon
S. Voloshynovskiy
AAML
229
0
0
05 Mar 2025
Are Large Vision Language Models Good Game Players?
International Conference on Learning Representations (ICLR), 2025
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
ELM
LRM
245
13
0
04 Mar 2025
Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation
Zhichao Yang
Leida Li
Pengfei Chen
Jinjian Wu
Giuseppe Valenzise
233
3
0
04 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Nianzu Yang
Yun Zheng
Liwei Wang
ObjD
VLM
448
12
0
03 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Medical Image Analysis (MedIA), 2025
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
269
0
0
03 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaoyi Liu
Huan Zhang
AAML
694
7
0
25 Feb 2025
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yi-Kai Zhang
De-Chuan Zhan
Han-Jia Ye
ALM
ELM
LRM
451
15
0
24 Feb 2025
Fine-Grained Captioning of Long Videos through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
595
1
0
23 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
508
0
0
20 Feb 2025
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Nianzu Yang
Yinglu Li
Zuan Gao
Yun Zheng
Hongtao Xie
VLM
CoGe
462
0
0
19 Feb 2025
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Tiancheng Gu
Kaicheng Yang
Chaoyi Zhang
Yin Xie
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
Jiankang Deng
CLIP
VLM
495
5
0
18 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
397
10
0
18 Feb 2025
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Computer Vision and Pattern Recognition (CVPR), 2025
Taeyoung Yun
Dinghuai Zhang
Jinkyoo Park
Ling Pan
DiffM
297
12
0
17 Feb 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
LRM
373
3
0
17 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ze Liu
Junjie Zhou
Yueze Wang
Zheng Liu
Defu Lian
OffRL
960
4
0
17 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
International Conference on Learning Representations (ICLR), 2025
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
306
12
0
17 Feb 2025
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Dexian Cai
Xiaocui Yang
Yongkang Liu
Daling Wang
Shi Feng
Yifei Zhang
Soujanya Poria
LRM
335
3
0
13 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
AAAI Conference on Artificial Intelligence (AAAI), 2025
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
373
4
0
28 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
AAAI Conference on Artificial Intelligence (AAAI), 2025
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
209
0
0
20 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Yuan Liu
Kaipeng Zhang
Dahua Lin
Yu Qiao
Shiyang Feng
Xiangyu Yue
MLLM
552
191
0
10 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
143
0
0
07 Jan 2025
A Novel Shape Guided Transformer Network for Instance Segmentation in Remote Sensing Images
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
Dawen Yu
Shunping Ji
ViT
286
5
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Neural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
787
118
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Computer Vision and Pattern Recognition (CVPR), 2023
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Celine Lee
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
272
38
0
31 Dec 2024
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Hao Zhang
Wenqi Shao
Hong Liu
Yongqiang Ma
Ping Luo
Yu Qiao
Kaipeng Zhang
Jianchao Tan
VLM
AAML
214
10
0
31 Dec 2024
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
International Conference on Artificial Neural Networks (ICANN), 2024
J. Zhang
Li Zhang
Shijian Li
VLM
382
0
0
18 Dec 2024
Adversarial Hubness in Multi-Modal Retrieval
Tingwei Zhang
Fnu Suya
Rishi Jha
Collin Zhang
Vitaly Shmatikov
AAML
572
3
0
18 Dec 2024
From Simple to Professional: A Combinatorial Controllable Image Captioning Agent
Xinran Wang
Muxi Diao
Baoteng Li
Hao Zhang
Kongming Liang
Tianhao Shen
MLLM
CLIP
286
0
0
15 Dec 2024
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
Neural Information Processing Systems (NeurIPS), 2024
Dong Hoon Lee
Seunghoon Hong
230
10
0
13 Dec 2024
DocVLM: Make Your VLM an Efficient Reader
Computer Vision and Pattern Recognition (CVPR), 2024
Mor Shpigel Nacson
Aviad Aberdam
Roy Ganz
Elad Ben Avraham
Alona Golts
Yair Kittenplon
Shai Mazor
Ron Litman
VLM
629
0
0
11 Dec 2024
Previous
1
2
3
4
5
...
29
30
31
Next