Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.16224
Cited By
LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models
29 August 2024
Jingyi Wang
Jianzhong Ju
Jian Luan
Zhidong Deng
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLaVA-SG: Leveraging Scene Graphs as Visual Semantic Expression in Vision-Language Models"
3 / 3 papers shown
Title
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
H. Li
VLM
61
3
0
24 Nov 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
1