ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04236
  4. Cited By
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
v1v2v3 (latest)

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

International Conference on Learning Representations (ICLR), 2024
6 February 2024
Ji Qi
Ming Ding
Weihan Wang
Yushi Bai
Qingsong Lv
Wenyi Hong
Bin Xu
Lei Hou
Juanzi Li
Yuxiao Dong
Jie Tang
    VLMLRM
ArXiv (abs)PDFHTMLHuggingFace (8 upvotes)

Papers citing "CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning"

14 / 14 papers shown
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
H. Rasheed
Mohammed Zumri
Muhammad Maaz
Ming-Hsuan Yang
Fahad Shahbaz Khan
Salman Khan
AI4TSLRM
164
0
0
28 Nov 2025
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Xingang Guo
Utkarsh Tyagi
Advait Gosai
Paula Vergara
Ernesto Gabriel Hernández Montoya
...
Bin Hu
Yunzhong He
Bing Liu
Bing Liu
Rakshith S Srinivasa
VLMLRM
325
3
0
14 Oct 2025
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Zhenlong Yuan
Xiangyan Qu
Chengxuan Qian
Rui Chen
Jing Tang
...
Xiangxiang Chu
Dapeng Zhang
Yiwei Wang
Y. Cai
Shuo Li
VLMLRM
140
8
0
09 Oct 2025
Reinforced Visual Perception with Tools
Reinforced Visual Perception with Tools
Zetong Zhou
Dongping Chen
Zixian Ma
Zhihan Hu
Mingyang Fu
Sinan Wang
Yao Wan
Zhou Zhao
Ranjay Krishna
OffRLVLMLRM
155
11
0
01 Sep 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
352
8
0
24 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
279
4
0
08 Aug 2025
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
Yang Chen
Yufan Shen
Wenxuan Huang
S. K. Zhou
Qunshu Lin
Xinyu Cai
Zhi Yu
Jiajun Bu
Ding Wang
Yu Qiao
OffRLLRM
301
8
0
28 Jul 2025
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li
Biao Yang
Qiang Liu
Shuo Zhang
Zhiyin Ma
Liang Yin
Linger Deng
Yabo Sun
Yuliang Liu
Xiang Bai
459
0
0
08 Jul 2025
MMGeoLM: Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models
MMGeoLM: Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models
Kai Sun
Yushi Bai
Zhen-Yi Yang
Jiajie Zhang
Ji Qi
Lei Hou
Juanzi Li
VLM
410
0
0
26 May 2025
FaceInsight: A Multimodal Large Language Model for Face Perception
FaceInsight: A Multimodal Large Language Model for Face Perception
Jingzhi Li
Changjiang Luo
Ruoyu Chen
Hua Zhang
Wenqi Ren
Jianhou Gan
Xiaochun Cao
CVBMLRM
392
2
0
22 Apr 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
X. J. Yang
Jing Liu
Peng Wang
Guoqing Wang
Yue Yang
Mengqi Li
ObjD
491
4
0
27 Feb 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Xinze Wang
VLMLRM
428
16
0
22 Feb 2025
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
629
47
0
24 Apr 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Fan Yang
Jinqiao Wang
Jinqiao Wang
ObjD
294
26
0
14 Mar 2024
1