ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19444
  4. Cited By
MathChat: Benchmarking Mathematical Reasoning and Instruction Following
  in Multi-Turn Interactions

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

29 May 2024
Zhenwen Liang
Dian Yu
Wenhao Yu
Wenlin Yao
Zhihan Zhang
Xiangliang Zhang
Dong Yu
    LRM
ArXiv (abs)PDFHTML

Papers citing "MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions"

12 / 12 papers shown
Title
Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation
Verifiable Accuracy and Abstention Rewards in Curriculum RL to Alleviate Lost-in-Conversation
Ming Li
KELM
117
0
0
21 Oct 2025
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
T3\mathbf{T^3}T3: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Deyu Zou
Yongqiang Chen
Jianxiang Wang
Haochen Yang
Mufei Li
James Cheng
Pan Li
Yu Gong
LRM
73
0
0
14 Oct 2025
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
Kartikeya Badola
Jonathan Simon
Arian Hosseini
Sara Mc Carthy
Tsendsuren Munkhdalai
Abhimanyu Goyal
Tomás Kociský
Shyam Upadhyay
Bahare Fatemi
Mehran Kazemi
LRM
124
2
0
13 Aug 2025
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Khizar Anjuma
Muhammad Arbab Arshad
Kadhim Hayawi
Efstathios Polyzos
A. Tariq
...
Nishith Reddy Mannuru
Ravi Varma Kumar Bevara
Taslim Mahbub
Muhammad Zeeshan Akram
Sakib Shahriar
ELMLRM
353
2
0
15 Jun 2025
Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks
Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks
Wenhan Dong
Tianyi Hu
Jingyi Zheng
Zhen Sun
Yuemeng Zhao
Yule Liu
Xinlei He
Xinyi Huang
LRMELM
144
2
0
28 May 2025
Exploring Communication Strategies for Collaborative LLM Agents in Mathematical Problem-Solving
Exploring Communication Strategies for Collaborative LLM Agents in Mathematical Problem-Solving
L. Zhang
Xiaoming Zhai
Jionghao Lin
Jionghao Lin
Jennifer Kleiman
Diego Zapata-Rivera
Carol Forsyth
Yang Jiang
Xiangen Hu
Arthur C. Graesser
LLMAG
71
0
0
02 May 2025
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yao Shi
Rongkeng Liang
Yong Xu
LLMAGAI4EdELM
250
4
0
21 Apr 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhuoshi Pan
Yu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Wu
Chenlin Ming
H. Vicky Zhao
Bin Wang
Lijun Wu
LRM
362
13
0
21 Mar 2025
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
Jakub Macina
Nico Daheim
Ido Hakimi
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
ELM
428
16
0
26 Feb 2025
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Yongbin Li
Miao Zheng
Fan Yang
Bin Cui
Tengjiao Wang
Xin Wu
Guosheng Dong
Wentao Zhang
ALM
301
10
0
12 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
206
35
0
05 Oct 2024
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-InstructInternational Conference on Learning Representations (ICLR), 2023
Haipeng Luo
Qingfeng Sun
Can Xu
Lu Wang
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
LRMOSLM
760
611
0
18 Aug 2023
1