ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.03354
  4. Cited By
CoVLM: Composing Visual Entities and Relationships in Large Language
  Models Via Communicative Decoding

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

International Conference on Learning Representations (ICLR), 2023
6 November 2023
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding"

15 / 15 papers shown
Title
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
Oleh Kuzyk
Zuoyue Li
Marc Pollefeys
Xi Wang
111
0
0
23 Jun 2025
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao
Zijun Wei
Jason Kuen
Kangning Liu
Lingzhi Zhang
Jiuxiang Gu
HyunJoon Jung
Liang-Yan Gui
Yu Wang
VLM
270
2
0
05 Jun 2025
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy
Duy Anh Huynh
Yutong Xie
Yuankai Qi
Qi Chen
...
Anton van den Hengel
Zhibin Liao
Minh-Son To
Johan Verjans
Vu Minh Hieu Phan
308
2
0
21 May 2025
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
270
1
0
10 Apr 2025
Is CLIP ideal? No. Can we fix it? Yes!
Raphi Kang
Yue Song
Georgia Gkioxari
Pietro Perona
VLM
276
4
0
10 Mar 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
X. J. Yang
Jing Liu
Peng Wang
Guoqing Wang
Yue Yang
Jikang Cheng
ObjD
406
2
0
27 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chunbai Zhang
Yang Zhou
Yang Zhou
Yan Peng
LRMReLM
356
1
0
02 Feb 2025
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorACM Multimedia (MM), 2024
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLMLRM
271
10
0
08 Dec 2024
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
Shengcao Cao
Liang-Yan Gui
Yu Wang
169
5
0
10 Oct 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junzhuo Liu
Xiaohu Yang
Weiwei Li
Peng Wang
ObjD
319
11
0
23 Sep 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
213
17
0
18 Jul 2024
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed
Mahmoud Ahmed
Jian Ding
Eslam Mohamed Bakr
Mohamed Elhoseiny
215
4
0
29 May 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
3D-VLA: A 3D Vision-Language-Action Generative World ModelInternational Conference on Machine Learning (ICML), 2024
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&RoVGenPINN
211
200
0
14 Mar 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Vasu Sharma
VLM
447
64
0
20 Feb 2024
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
213
18
0
10 Nov 2023
1