Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2105.04165
Cited By
v1
v2
v3 (latest)
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
10 May 2021
Pan Lu
Ran Gong
Shibiao Jiang
Liang Qiu
Siyuan Huang
Xiaodan Liang
Song-Chun Zhu
AIMat
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"
50 / 238 papers shown
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions
Jo-Ku Cheng
Zeren Zhang
Ran Chen
Jingyang Deng
Ziran Qin
Jinwen Ma
330
6
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
672
829
1
14 Apr 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yikun Wang
Siyin Wang
Qinyuan Cheng
Zhaoye Fei
Liang Ding
Qipeng Guo
Dacheng Tao
Jiaqi Leng
LRM
289
23
0
12 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
386
2
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLM
VLM
302
1
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Xinze Wang
Zhiyong Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
LRM
VLM
599
75
0
10 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
436
14
0
03 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
495
44
0
31 Mar 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
473
1
0
28 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
552
0
0
26 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jinfa Huang
Jie Lou
Debing Zhang
Rongrong Ji
502
6
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
429
15
0
24 Mar 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Philip S. Yu
Xuming Hu
Qingsong Wen
721
23
0
23 Mar 2025
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng
Jingyi Wang
Wenhao Jiang
Zhong Ming
VLM
LRM
290
3
0
19 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
Lawrence Yunliang Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
349
87
0
13 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Junlong Li
Xiang Yue
Bo Li
Ping Nie
Dayou Du
Lei Ma
LRM
504
17
0
13 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
733
187
0
10 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Hao Wu
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
600
361
0
09 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
Jiawei Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
398
11
0
08 Mar 2025
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao
Ting Zhang
Jiayu Sun
Mi Tian
Hua Huang
221
4
0
07 Mar 2025
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
Feng Ni
Kui Huang
Yao Lu
Wenyu Lv
Guanzhong Wang
Zeyu Chen
Wenshu Fan
VLM
462
2
0
06 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Yanzhe Zhang
Xiren Zhou
MoE
SyDa
299
294
0
03 Mar 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
235
1
0
19 Feb 2025
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
266
2
0
17 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
354
11
0
17 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Daniel Schwalbe-Koda
B. Selman
Qingsong Wen
LRM
573
27
0
05 Feb 2025
Large Language Models are Few-shot Multivariate Time Series Classifiers
Data mining and knowledge discovery (DMKD), 2025
Yakun Chen
Zihao Li
Chao Yang
Xianzhi Wang
Guandong Xu
AI4TS
203
6
0
30 Jan 2025
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Sami Baral
L. Lucy
Ryan Knight
Alice Ng
Luca Soldaini
Neil T. Heffernan
Kyle Lo
315
11
0
28 Jan 2025
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
624
21
0
03 Jan 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Computer Vision and Pattern Recognition (CVPR), 2024
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
514
5
0
20 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
International Conference on Learning Representations (ICLR), 2024
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Bin Wang
Ding Wang
Tao Chen
Junchi Yan
Bo Zhang
465
24
0
16 Dec 2024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Chenyu Yang
Xuan Dong
X. Zhu
Weijie Su
Jiahao Wang
H. Tian
Zheyu Chen
Wenhai Wang
Lewei Lu
Jifeng Dai
VLM
240
11
0
12 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Ding Wang
Tao Chen
Bo Zhang
Xiangyu Yue
618
9
0
08 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zhiyong Yang
Xiangyu Yue
MLLM
AuLLM
VLM
282
25
0
03 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
403
4
0
02 Dec 2024
AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Runhui Huang
...
Lanqing Hong
J. N. Han
Xiaodan Liang
Hang Xu
Xiaodan Liang
LRM
611
18
0
18 Nov 2024
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Computer Vision and Pattern Recognition (CVPR), 2024
Xudong Lu
Yinghao Chen
Cheng Chen
Hui Tan
Boheng Chen
...
Aojun Zhou
Yafei Wen
Xiaoxin Chen
Shuai Ren
Jiaming Song
210
20
0
16 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
536
186
1
15 Nov 2024
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
Linger Deng
Linghao Zhu
Bohan Li
Dongliang Luo
Liang Wu
Chengquan Zhang
Pengyuan Lyu
Ziyang Zhang
Gang Zhang
ReLM
3DV
LRM
386
13
0
23 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
404
88
0
21 Oct 2024
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
Tiancheng Gu
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
MLLM
VLM
373
1
0
18 Oct 2024
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Aditya Sharma
Aman Dalmia
Mehran Kazemi
Christopher J. Pal
Amal Zouaq
LRM
159
8
0
17 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Computer Vision and Pattern Recognition (CVPR), 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
383
68
0
10 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
International Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
370
29
0
10 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
263
19
0
06 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
307
66
1
30 Sep 2024
KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry
Weichen Dai
Yezeng Chen
Zijie Dai
Zhijie Huang
Wenshu Fan
...
Chengli Zhong
Xinhe Li
Zeyu Wang
Zhuoying Feng
Yi Zhou
285
0
0
27 Sep 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
206
32
0
19 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
308
114
0
17 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLM
LRM
193
15
0
10 Sep 2024
Previous
1
2
3
4
5
Next
Page 3 of 5