Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04387
Cited By
M
3
^3
3
IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
7 June 2023
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
Shuhuai Ren
Mukai Li
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning"
22 / 22 papers shown
Title
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
48
0
0
04 May 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
H. Shen
63
1
0
11 Mar 2025
A Zero-Shot Learning Approach for Ephemeral Gully Detection from Remote Sensing using Vision Language Models
Seyed Mohamad Ali Tousi
Ramy M. A. Farag
Jacket Demby's
Gbenga Omotara
John A. Lory
Guilherme N. DeSouza
89
0
0
03 Mar 2025
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
68
4
0
08 Dec 2024
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
44
0
0
09 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
77
25
0
04 Oct 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
34
5
0
29 Jun 2024
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
19
1
0
27 Jun 2024
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models
Hongyu Wang
Jiayu Xu
Senwei Xie
Ruiping Wang
Jialin Li
Zhaojie Xie
Bin Zhang
Chuyan Xiong
Xilin Chen
ELM
VLM
LRM
86
7
0
24 May 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Ajmal Saeed Mian
29
2
0
21 May 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
71
17
0
07 Apr 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
45
3
0
07 Mar 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
38
46
0
29 Jan 2024
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
49
20
0
28 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
57
22
0
02 Nov 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
24
92
0
27 Sep 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
29
77
0
12 Aug 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
20
42
0
03 Jul 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
579
0
06 Apr 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
207
1,101
0
20 Sep 2022
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
177
515
0
26 Jan 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
282
39,170
0
01 Sep 2014
1