ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12524
  4. Cited By
TheoremQA: A Theorem-driven Question Answering dataset

TheoremQA: A Theorem-driven Question Answering dataset

21 May 2023
Wenhu Chen
Ming Yin
Max W.F. Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
    AIMat
ArXivPDFHTML

Papers citing "TheoremQA: A Theorem-driven Question Answering dataset"

50 / 91 papers shown
Title
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Y. Wang
Pei Zhang
Jialong Tang
H. Wei
Baosong Yang
...
Y. Zhang
Fei Huang
Junyang Lin
Fei Huang
Jingren Zhou
LRM
50
0
0
25 Apr 2025
Synergizing RAG and Reasoning: A Systematic Review
Synergizing RAG and Reasoning: A Systematic Review
Yunfan Gao
Yun Xiong
Yijie Zhong
Yuxi Bi
Ming Xue
H. Wang
LRM
AI4CE
31
0
0
22 Apr 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
32
0
0
18 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
63
6
1
14 Apr 2025
UNDO: Understanding Distillation as Optimization
UNDO: Understanding Distillation as Optimization
Kushal Kumar Jain
Piyushi Goyal
Kumar Shridhar
31
0
0
03 Apr 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
72
0
0
29 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Efficient Inference for Large Reasoning Models: A Survey
Y. Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
LLMAG
LRM
58
7
0
29 Mar 2025
A Survey of Large Language Model Agents for Question Answering
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
57
0
0
24 Mar 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
44
1
0
20 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
X. Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
C. Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
49
1
0
12 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Y. Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Z. Zhang
Fei Huang
Rui Wang
BDL
LRM
62
6
0
03 Mar 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
Guizhen Chen
Weiwen Xu
Hao Zhang
Hou Pong Chan
Chaoqun Liu
Lidong Bing
Deli Zhao
Anh Tuan Luu
Yu Rong
ReLM
LRM
45
3
0
27 Feb 2025
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding
Max W.F. Ku
Thomas Chong
Jonathan Leung
Krish Shah
Alvin Yu
Wenhu Chen
LRM
88
3
0
26 Feb 2025
DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance
Xuanfan Ni
Liyan Xu
Chenyang Lyu
Longyue Wang
Mo Yu
Lemao Liu
Fandong Meng
Jie Zhou
Piji Li
40
0
0
24 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
X. Zhang
Yuxuan Dong
Y. Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
L. Zhang
Jun Liu
AIMat
ReLM
LRM
51
2
0
17 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
111
9
0
05 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
84
2
0
01 Feb 2025
Federated Retrieval Augmented Generation for Multi-Product Question Answering
Parshin Shojaee
Sai Sree Harsha
Dan Luo
Akash Maharaj
Tong Yu
Yunyao Li
35
3
0
28 Jan 2025
Using Large Language Models for education managements in Vietnamese with low resources
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
36
0
0
28 Jan 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Yilun Zhao
Lujing Xie
Haowei Zhang
Guo Gan
Yitao Long
...
Xiangru Tang
Zhenwen Liang
Y. Liu
Chen Zhao
Arman Cohan
45
5
0
21 Jan 2025
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
Sijia Chen
Baochun Li
KELM
LRM
71
6
0
27 Dec 2024
ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual
  Question Answering?
ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?
Pragati Shuddhodhan Meshram
Swetha Karthikeyan
Bhavya
Suma Bhat
92
0
0
27 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
52
45
1
15 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
62
1
0
11 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+
  Interaction Trajectories
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song
Weimin Xiong
Xiutian Zhao
Dawei Zhu
Wenhao Wu
Ke Wang
Cheng Li
Wei Peng
Sujian Li
LLMAG
21
9
0
10 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal
  Large Language Models Via Error Detection
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
B. Li
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
25
13
0
06 Oct 2024
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning
  Trajectories Search
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Murong Yue
Wenlin Yao
Haitao Mi
Dian Yu
Ziyu Yao
Dong Yu
LRM
28
4
0
04 Oct 2024
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with
  Retrieval-Augmentation for Solving Challenging Tasks
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
Xingxuan Li
Weiwen Xu
Ruochen Zhao
Fangkai Jiao
Shafiq R. Joty
Lidong Bing
LRM
37
8
0
02 Oct 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
70
195
0
18 Sep 2024
The representation landscape of few-shot learning and fine-tuning in
  large language models
The representation landscape of few-shot learning and fine-tuning in large language models
Diego Doimo
Alessandro Serra
A. Ansuini
Alberto Cazzaniga
83
4
0
05 Sep 2024
WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy
  Domain
WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain
Rounak Meyur
Hung Phan
S. Wagle
Jan Strube
M. Halappanavar
Sameera Horawalavithana
Anurag Acharya
Sai Munikoti
13
0
0
21 Aug 2024
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic
  Mathematical Reasoning
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning
Bo-Wen Zhang
Yan Yan
Lin Li
Guang Liu
ReLM
LRM
14
5
0
09 Aug 2024
StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT
  Interactions
StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions
Zixin Chen
Jiachen Wang
Meng Xia
Kento Shigyo
Dingdong Liu
Rong Zhang
Huamin Qu
50
4
0
17 Jul 2024
States Hidden in Hidden States: LLMs Emerge Discrete State
  Representations Implicitly
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
Junhao Chen
Shengding Hu
Zhiyuan Liu
Maosong Sun
LRM
30
5
0
16 Jul 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Bo Zheng
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
41
458
0
15 Jul 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical
  Reasoning with Checklist
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
58
23
0
11 Jul 2024
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical
  Problem-Solving
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong
Xiwen Zhang
Rui Wang
R. Wu
Junxian He
AIMat
LRM
33
30
0
18 Jun 2024
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in
  Transformers
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Shiliang Zhang
Chong Deng
Hai Yu
Jiaqing Liu
Yukun Ma
Chong Zhang
20
1
0
17 Jun 2024
Are Large Language Models Good Statisticians?
Are Large Language Models Good Statisticians?
Yizhang Zhu
Shiyin Du
Boyan Li
Yuyu Luo
Nan Tang
ELM
19
15
0
12 Jun 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over
  Scientific Literature
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
David Wadden
Kejian Shi
Jacob Morrison
Aakanksha Naik
Shruti Singh
...
Luca Soldaini
Shannon Zejiang Shen
Doug Downey
Hannaneh Hajishirzi
Arman Cohan
42
11
0
10 Jun 2024
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning
  Strategies
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Junlin Wang
Siddhartha Jain
Dejiao Zhang
Baishakhi Ray
Varun Kumar
Ben Athiwaratkun
22
19
0
10 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language
  Understanding Benchmark
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRM
ELM
31
92
0
03 Jun 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
  Series
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Ge Zhang
Scott Qu
Jiaheng Liu
Chenchen Zhang
Chenghua Lin
...
Zi-Kai Zhao
Jiajun Zhang
Wanli Ouyang
Wenhao Huang
Wenhu Chen
ELM
32
44
0
29 May 2024
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
Gelei Deng
Haoran Ou
Yi Liu
Jie M. Zhang
Tianwei Zhang
Yang Liu
LRM
23
5
0
13 May 2024
MAmmoTH2: Scaling Instructions from the Web
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
38
77
0
06 May 2024
12
Next