Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.09970
Cited By
Tackling Vision Language Tasks Through Learning Inner Monologues
19 August 2023
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Y. Zhang
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tackling Vision Language Tasks Through Learning Inner Monologues"
11 / 11 papers shown
Title
Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing
Diji Yang
Linda Zeng
Jinmeng Rao
Y. Zhang
20
0
0
05 May 2025
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Li Liu
Diji Yang
Sijia Zhong
Kalyana Suma Sree Tholeti
Lei Ding
Yi Zhang
Leilani H. Gilpin
26
0
0
01 Nov 2024
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Diji Yang
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yi Zhang
LRM
RALM
25
4
0
15 May 2024
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
Junchen Fu
Xuri Ge
Xin Xin
Alexandros Karatzoglou
Ioannis Arapakis
Jie Wang
Joemon M. Jose
22
17
0
02 Apr 2024
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
Jiaying Lu
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Baochen Sun
Carl Yang
Jie Yang
11
9
0
07 Sep 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
161
401
0
10 Sep 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1