Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2209.09513
Cited By
v1
v2 (latest)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Neural Information Processing Systems (NeurIPS), 2022
20 September 2022
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering"
50 / 1,273 papers shown
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-yang Liu
Chien-Yi Wang
Hongxu Yin
Pavlo Molchanov
Yu-Chiang Frank Wang
Kwang-Ting Cheng
Min-Hung Chen
770
676
0
14 Feb 2024
Higher Layers Need More LoRA Experts
Chongyang Gao
Kezhen Chen
Jinmeng Rao
Baochen Sun
Ruibo Liu
Daiyi Peng
Yawen Zhang
Xiaoyuan Guo
Jie Yang
V. Subrahmanian
MoE
208
83
0
13 Feb 2024
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Dongsheng Zhu
Xunzhu Tang
Weidong Han
Jinghui Lu
Yukun Zhao
Guoliang Xing
Junfeng Wang
D. Yin
VLM
MLLM
297
17
0
12 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Jiaming Song
Yu Qiao
Shiyang Feng
MLLM
523
139
0
08 Feb 2024
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
282
35
0
06 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo Zhang
Chunhua Shen
VLM
MLLM
238
149
0
06 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
International Conference on Machine Learning (ICML), 2024
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Chen Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
262
78
0
05 Feb 2024
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
374
6
0
05 Feb 2024
Copyright Protection in Generative AI: A Technical Perspective
Jie Ren
Han Xu
Pengfei He
Yingqian Cui
Shenglai Zeng
...
Hongzhi Wen
Jiayuan Ding
Hui Liu
Yi Chang
Shucheng Zhou
DeLMO
337
56
0
04 Feb 2024
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong
Ondrej Bohdal
Tingyang Yu
Yongxin Yang
Timothy M. Hospedales
VLM
MLLM
290
111
0
03 Feb 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
148
5
0
31 Jan 2024
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models
Xiao Shao
Weifu Jiang
Fei Zuo
Mengqing Liu
LLMAG
226
13
0
31 Jan 2024
MouSi: Poly-Visual-Expert Vision-Language Models
Xiaoran Fan
Changzhi Sun
Changhao Jiang
Shuo Li
Senjie Jin
...
Tao Gui
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yunchun Jiang
VLM
159
24
0
30 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Sijin Yu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Yuan Liu
VLM
MLLM
370
344
0
29 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLM
MLLM
MoE
439
269
0
29 Jan 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yue Fan
Jing Gu
KAI-QING Zhou
Qianqi Yan
Shan Jiang
Ching-Chen Kuo
Xinze Guan
Xin Eric Wang
292
11
0
29 Jan 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Xue Sun
Xinya Wu
Pengfei Zhou
Richeng Xuan
Guang Liu
Xi Yang
Qiannan Zhu
Hua Huang
ELM
LRM
297
28
0
25 Jan 2024
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hongliang He
Wenlin Yao
Kaixin Ma
Wenhao Yu
Yong Dai
Hongming Zhang
Zhenzhong Lan
Dong Yu
LLMAG
501
239
0
25 Jan 2024
Demystifying Chains, Trees, and Graphs of Thoughts
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Maciej Besta
Florim Memedi
Zhenyu Zhang
Robert Gerstenberger
Guangyuan Piao
...
Aleš Kubíček
H. Niewiadomski
Aidan O'Mahony
Onur Mutlu
Torsten Hoefler
AI4CE
LRM
1.0K
54
0
25 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
512
335
0
24 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
AAAI Conference on Artificial Intelligence (AAAI), 2024
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
262
36
0
24 Jan 2024
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Debjyoti Mondal
Suraj Modi
Subhadarshi Panda
Rituraj Singh
Godawari Sudhakar Rao
LRM
179
78
0
23 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
255
3
0
18 Jan 2024
Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2024
Yunshi Lan
Xinyuan Li
Hanyue Du
Xuesong Lu
Ming Gao
Weining Qian
Aoying Zhou
441
13
0
15 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
621
95
0
11 Jan 2024
REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Andrew Gritsevskiy
Arjun Panickssery
Aaron Kirtland
Derik Kauffman
Hans Gundlach
Irina Gritsevskaya
Joe Cavanagh
Jonathan Chiang
Lydia La Roux
Michelle Hung
ReLM
128
5
0
11 Jan 2024
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Shuofei Qiao
Ningyu Zhang
Runnan Fang
Yujie Luo
Wangchunshu Zhou
Yuchen Eleanor Jiang
Chengfei Lv
Huajun Chen
LLMAG
350
68
0
10 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
315
146
0
10 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
276
6
0
06 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
729
145
0
04 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
International Conference on Machine Learning (ICML), 2024
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
385
407
0
03 Jan 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2024
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
534
50
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
717
170
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
282
274
0
28 Dec 2023
MIVC: Multiple Instance Visual Component for Visual-Language Models
Wenyi Wu
Qi Li
Leon Wenliang Zhong
Junzhou Huang
201
4
0
28 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
312
70
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
198
30
0
27 Dec 2023
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
Zehui Chen
Weihua Du
Wenwei Zhang
Kuikun Liu
Jiangning Liu
...
Jingming Zhuo
Songyang Zhang
Dahua Lin
Kai-xiang Chen
Feng Zhao
LLMAG
ELM
396
59
0
21 Dec 2023
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
371
419
0
20 Dec 2023
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Yunhao Gou
Zhili Liu
Kai Chen
Lanqing Hong
Hang Xu
Aoxue Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MoE
MLLM
VLM
429
102
0
19 Dec 2023
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
Enze Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
582
53
0
17 Dec 2023
Decoding Concerns: Multi-label Classification of Vaccine Sentiments in Social Media
Somsubhra De
Shaurya Vats
189
2
0
17 Dec 2023
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
European Conference on Computer Vision (ECCV), 2023
Zhiyuan You
Zheyuan Li
Jinjin Gu
Zhenfei Yin
Tianfan Xue
Chao Dong
EGVM
400
91
0
14 Dec 2023
Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2023
Liqi He
Zuchao Li
Xiantao Cai
Ping Wang
LRM
193
34
0
14 Dec 2023
VILA: On Pre-training for Visual Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ji Lin
Hongxu Yin
Ming-Yu Liu
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
Mohammad Shoeybi
Song Han
MLLM
VLM
641
681
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
401
197
0
11 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
460
12
0
11 Dec 2023
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
Michael Ranzinger
Greg Heinrich
Jan Kautz
Pavlo Molchanov
VLM
820
121
0
10 Dec 2023
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Shitian Zhao
Zhuowan Li
Yadong Lu
Yaoyao Liu
Yan Wang
LRM
191
14
0
09 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
Computer Vision and Pattern Recognition (CVPR), 2023
Mohammad Reza Taesiri
Tianjun Feng
Anh Totti Nguyen
Cor-Paul Bezemer
MLLM
VLM
LRM
328
18
0
08 Dec 2023
Previous
1
2
3
...
21
22
23
24
25
26
Next