ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1504.00325
  4. Cited By
Microsoft COCO Captions: Data Collection and Evaluation Server

Microsoft COCO Captions: Data Collection and Evaluation Server

1 April 2015
Xinlei Chen
Hao Fang
Tsung-Yi Lin
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
ArXivPDFHTML

Papers citing "Microsoft COCO Captions: Data Collection and Evaluation Server"

50 / 1,387 papers shown
Title
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary
  Segmentation
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Xi Chen
Haosen Yang
Sheng Jin
Xiatian Zhu
H. Yao
VLM
29
3
0
05 Sep 2024
A New People-Object Interaction Dataset and NVS Benchmarks
A New People-Object Interaction Dataset and NVS Benchmarks
Shuai Guo
Houqiang Zhong
Q. Wang
Ziyu Chen
Yijie Gao
Jiajing Yuan
Chenyu Zhang
Rong Xie
Li-Na Song
33
0
0
03 Sep 2024
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
40
0
0
03 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio
  Captioning
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
24
1
0
02 Sep 2024
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding
  Data
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
Spencer Whitehead
Jacob Phillips
Sean Hendryx
31
0
0
30 Aug 2024
Image-Perfect Imperfections: Safety, Bias, and Authenticity in the
  Shadow of Text-To-Image Model Evolution
Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution
Yixin Wu
Yun Shen
Michael Backes
Yang Zhang
42
1
0
30 Aug 2024
A Survey on Evaluation of Multimodal Large Language Models
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
Probing the Robustness of Vision-Language Pretrained Models: A
  Multimodal Adversarial Attack Approach
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
31
1
0
24 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLM
VLM
82
13
0
23 Aug 2024
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Qizhou Chen
Taolin Zhang
Chengyu Wang
Xiaofeng He
Dakan Wang
Tingting Liu
KELM
51
2
0
19 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
40
10
0
17 Aug 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
48
11
0
15 Aug 2024
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
Sungyeon Kim
Boseung Jeong
Donghyun Kim
Suha Kwak
VLM
26
2
0
11 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
45
2
0
07 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive
  Survey
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
50
16
0
06 Aug 2024
VisionUnite: A Vision-Language Foundation Model for Ophthalmology
  Enhanced with Clinical Knowledge
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
Zihan Li
Diping Song
Zefeng Yang
Deming Wang
Fei Li
Xiulan Zhang
P. E. Kinahan
Yu Qiao
VLM
LM&MA
18
3
0
05 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
19
2
0
05 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in
  Vision-Language Tasks
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
38
1
0
29 Jul 2024
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross
  Modal Retrieval
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen
Pengfei Zhang
Kai Ye
Wei Dong
Xin Feng
Yana Zhang
41
0
0
28 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
36
3
0
28 Jul 2024
SWIFT: Semantic Watermarking for Image Forgery Thwarting
SWIFT: Semantic Watermarking for Image Forgery Thwarting
Gautier Evennou
Vivien Chappelier
Ewa Kijak
Teddy Furon
45
1
0
26 Jul 2024
Multimodal Unlearnable Examples: Protecting Data against Multimodal
  Contrastive Learning
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning
Xinwei Liu
Xiaojun Jia
Yuan Xun
Siyuan Liang
Xiaochun Cao
39
7
0
23 Jul 2024
Knowledge Acquisition Disentanglement for Knowledge-based Visual
  Question Answering with Large Language Models
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
Wenbin An
Feng Tian
Jiahao Nie
Wenkai Shi
Haonan Lin
Yan Chen
Qianying Wang
Y. Wu
Guang Dai
Ping Chen
VLM
45
4
0
22 Jul 2024
Assessing Brittleness of Image-Text Retrieval Benchmarks from
  Vision-Language Models Perspective
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
48
0
0
21 Jul 2024
EarthMarker: Visual Prompt Learning for Region-level and Point-level
  Remote Sensing Imagery Comprehension
EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension
Wei Zhang
Miaoxin Cai
Tong Zhang
Jun Li
Zhuang Yin
Xuerui Mao
61
5
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
40
6
0
17 Jul 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
40
5
0
16 Jul 2024
QVD: Post-training Quantization for Video Diffusion Models
QVD: Post-training Quantization for Video Diffusion Models
Shilong Tian
Hong Chen
Chengtao Lv
Yu Liu
Jinyang Guo
Xianglong Liu
Shengxi Li
Hao Yang
Tao Xie
VGen
MQ
46
2
0
16 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
36
114
0
16 Jul 2024
Evaluating Model Bias Requires Characterizing its Mistakes
Evaluating Model Bias Requires Characterizing its Mistakes
Isabela Albuquerque
Jessica Schrouff
David Warde-Farley
Ali Taylan Cemgil
Sven Gowal
Olivia Wiles
47
2
0
15 Jul 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
32
6
0
11 Jul 2024
Vision-Language Models under Cultural and Inclusive Considerations
Vision-Language Models under Cultural and Inclusive Considerations
Antonia Karamolegkou
Phillip Rust
Yong Cao
Ruixiang Cui
Anders Søgaard
Daniel Hershcovich
VLM
51
7
0
08 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
49
5
0
06 Jul 2024
Loki: A System for Serving ML Inference Pipelines with Hardware and
  Accuracy Scaling
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
Sohaib Ahmad
Hui Guan
Ramesh K. Sitaraman
40
4
0
04 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training
  with Limited Resources
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
53
2
0
01 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
35
5
0
01 Jul 2024
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu
Fei Wang
Sheng Zhang
Hoifung Poon
Muhao Chen
34
6
0
01 Jul 2024
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Jinming Li
Yichen Zhu
Zhiyuan Xu
Jindong Gu
Minjie Zhu
Xin Liu
Ning Liu
Yaxin Peng
Feifei Feng
Jian Tang
LRM
LM&Ro
33
6
0
28 Jun 2024
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with
  1-to-K Contrastive Learning
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Zhijie Nie
Richong Zhang
Zhangchi Feng
Hailang Huang
Xudong Liu
32
1
0
26 Jun 2024
Figuring out Figures: Using Textual References to Caption Scientific
  Figures
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
34
0
0
25 Jun 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
36
1
0
23 Jun 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative
  Image Caption Enrichment
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota
Ryo Hachiuma
Chao-Han Huck Yang
Yuta Nakashima
VLM
33
3
0
20 Jun 2024
Evaluating Numerical Reasoning in Text-to-Image Models
Evaluating Numerical Reasoning in Text-to-Image Models
Ivana Kajić
Olivia Wiles
Isabela Albuquerque
Matthias Bauer
Su Wang
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
ReLM
87
0
0
20 Jun 2024
Learnable In-Context Vector for Visual Question Answering
Learnable In-Context Vector for Visual Question Answering
Yingzhe Peng
Chenduo Hao
Xu Yang
Jiawei Peng
Xinting Hu
Xin Geng
37
4
0
19 Jun 2024
Reframing linguistic bootstrapping as joint inference using
  visually-grounded grammar induction models
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
Eva Portelance
Siva Reddy
Timothy J. O'Donnell
16
2
0
17 Jun 2024
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in
  Multimodal Large Language Model
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo
Yibo Yan
Boren Hu
Yutao Yue
Xuming Hu
LRM
MLLM
37
7
0
17 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
43
26
0
16 Jun 2024
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan
Dongyue Wu
Guilin Zhu
Yuanjie Shao
Nong Sang
Changxin Gao
VLM
29
15
0
14 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
46
4
0
13 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
56
20
0
12 Jun 2024
Previous
12345...262728
Next