ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07921
  4. Cited By
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
  Code-based Self-Verification

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

15 August 2023
Aojun Zhou
Ke Wang
Zimu Lu
Weikang Shi
Sichun Luo
Zipeng Qin
Shaoqing Lu
Anya Jia
Linqi Song
Mingjie Zhan
Hongsheng Li
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification"

50 / 131 papers shown
Title
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics
Zena Al-Khalili
Nick Howell
Dietrich Klakow
LRM
21
0
0
24 Apr 2025
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
32
0
0
18 Apr 2025
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Liangyu Xu
Yingxiu Zhao
J. Wang
Yingyao Wang
Bu Pi
...
Jihao Gu
X. Li
Xiaoyong Zhu
Jun Song
Bo Zheng
LRM
91
1
0
17 Apr 2025
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Zuoli Tang
Junjie Ou
Kaiqin Hu
Chunwei Wu
Zhaoxin Huan
Chilin Fu
Xiaolu Zhang
Jun Zhou
Chenliang Li
ReLM
LRM
35
0
0
13 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
32
2
0
07 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
29
0
0
04 Apr 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
59
3
0
23 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning
Wei Sun
Qianlong Du
Fuwei Cui
Jiajun Zhang
OffRL
LRM
31
0
0
04 Mar 2025
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Yi-Fan Zhang
Hang Li
D. Song
Lichao Sun
Tianlong Xu
Qingsong Wen
LLMAG
LRM
85
2
0
20 Feb 2025
An Efficient Row-Based Sparse Fine-Tuning
An Efficient Row-Based Sparse Fine-Tuning
Cen-Jhih Li
Aditya Bhaskara
49
0
0
17 Feb 2025
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
44
0
0
17 Feb 2025
Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models
Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models
Sijia Chen
Baochun Li
Di Niu
LLMAG
LRM
AI4CE
67
11
0
08 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
W. Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
79
12
0
03 Jan 2025
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
Toward Adaptive Reasoning in Large Language Models with Thought Rollback
Sijia Chen
Baochun Li
KELM
LRM
71
6
0
27 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
57
30
0
25 Dec 2024
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language
  Models
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
Hieu Tran
Zonghai Yao
Junda Wang
Yifan Zhang
Zhichao Yang
Hong-ye Yu
LRM
71
5
0
03 Dec 2024
Improving Physics Reasoning in Large Language Models Using Mixture of
  Refinement Agents
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents
Raj Jaiswal
Dhruv Jain
Harsh Parimal Popat
Avinash Anand
Abhishek Dharmadhikari
Atharva Marathe
R. Shah
LRM
AI4CE
87
3
0
01 Dec 2024
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Xiaoxuan Lou
Chaojie Wang
Bo An
LLMAG
LRM
67
0
0
28 Nov 2024
Curriculum Demonstration Selection for In-Context Learning
Curriculum Demonstration Selection for In-Context Learning
Duc Anh Vu
Nguyen Tran Cong Duy
Xiaobao Wu
Hoang Minh Nhat
Du Mingzhe
Nguyen Thanh Thong
Anh Tuan Luu
67
0
0
27 Nov 2024
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient
  Fine-Tuning
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
Zhen Sun
Tianshuo Cong
Yule Liu
Chenhao Lin
Xinlei He
Rongmao Chen
Xingshuo Han
Xinyi Huang
AAML
72
3
0
26 Nov 2024
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM
  Training in Proof Generation
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation
Chenyang An
Shima Imani
Feng Yao
Chengyu Dong
Ali Abbasi
...
Samuel Buss
Jingbo Shang
Gayathri Mahalingam
Pramod Sharma
Maurice Diesendruck
LRM
26
1
0
30 Oct 2024
Retrieval-Augmented Generation with Estimation of Source Reliability
Retrieval-Augmented Generation with Estimation of Source Reliability
Jeongyeon Hwang
Junyoung Park
Hyejin Park
Sangdon Park
Jungseul Ok
RALM
42
0
0
30 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of
  LLMs
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Z. Li
Liwei Wang
LRM
30
5
0
17 Oct 2024
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang
Pei Zhang
Baosong Yang
Derek F. Wong
Rui-cang Wang
LRM
40
4
0
17 Oct 2024
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of
  Language Models for Math Reasoning
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Vernon Y.H. Toh
Deepanway Ghosal
Soujanya Poria
LRM
43
2
0
16 Oct 2024
Expanding Search Space with Diverse Prompting Agents: An Efficient
  Sampling Approach for LLM Mathematical Reasoning
Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning
Gisang Lee
Sangwoo Park
Junyoung Park
Andrew Chung
Sieun Park
Yoonah Park
Byungju Kim
Min-gyu Cho
LRM
22
1
0
13 Oct 2024
MathCoder2: Better Math Reasoning from Continued Pretraining on
  Model-translated Mathematical Code
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
57
7
0
10 Oct 2024
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for
  Enhancing Reasoning in Large Language Models
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
Wenting Tan
Dongxiao Chen
Jieting Xue
Zihao Wang
Taijie Chen
LRM
20
0
0
10 Oct 2024
Executing Arithmetic: Fine-Tuning Large Language Models as Turing
  Machines
Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines
Junyu Lai
Jiahe Xu
Yao Yang
Yunpeng Huang
Chun Cao
Jingwei Xu
LRM
24
2
0
10 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
24
4
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
32
17
0
05 Oct 2024
Steering Large Language Models between Code Execution and Textual Reasoning
Steering Large Language Models between Code Execution and Textual Reasoning
Yongchao Chen
Harsh Jhamtani
Srinagesh Sharma
Chuchu Fan
Chi Wang
LLMAG
LRM
31
6
0
04 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model
  Reasoning
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
20
5
0
03 Oct 2024
HLB: Benchmarking LLMs' Humanlikeness in Language Use
HLB: Benchmarking LLMs' Humanlikeness in Language Use
Xufeng Duan
Bei Xiao
Xuemei Tang
Zhenguang G. Cai
22
3
0
24 Sep 2024
Unlocking Reasoning Potential in Large Langauge Models by Scaling
  Code-form Planning
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning
Jiaxin Wen
Jian Guan
Hongning Wang
Wei Wu
Minlie Huang
ReLM
OffRL
LRM
26
7
0
19 Sep 2024
AI-Driven Virtual Teacher for Enhanced Educational Efficiency:
  Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction
AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction
Tianlong Xu
Yi-Fan Zhang
Zhendong Chu
Shen Wang
Qingsong Wen
24
5
0
14 Sep 2024
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large
  Language Models
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Dian Yu
Baolin Peng
Ye Tian
Linfeng Song
Haitao Mi
Dong Yu
ALM
LRM
31
1
0
28 Aug 2024
Benchmarks as Microscopes: A Call for Model Metrology
Benchmarks as Microscopes: A Call for Model Metrology
Michael Stephen Saxon
Ari Holtzman
Peter West
William Yang Wang
Naomi Saphra
23
10
0
22 Jul 2024
COMET: "Cone of experience" enhanced large multimodal model for
  mathematical problem generation
COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation
Sannyuya Liu
Jintian Feng
Zongkai Yang
Yawei Luo
Qian Wan
Xiaoxuan Shen
Jianwen Sun
41
3
0
16 Jul 2024
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Songyang Zhang
Chuyu Zhang
Yingfan Hu
Haowen Shen
Kuikun Liu
...
Fengzhe Zhou
Wenwei Zhang
Xuming He
Dahua Lin
Kai-xiang Chen
31
1
0
15 Jul 2024
Lucy: Think and Reason to Solve Text-to-SQL
Lucy: Think and Reason to Solve Text-to-SQL
Nina Narodytska
S. Vargaftik
LMTD
ReLM
AI4TS
LRM
16
2
0
06 Jul 2024
Spontaneous Reward Hacking in Iterative Self-Refinement
Spontaneous Reward Hacking in Iterative Self-Refinement
Jane Pan
He He
Samuel R. Bowman
Shi Feng
27
10
0
05 Jul 2024
DotaMath: Decomposition of Thought with Code Assistance and
  Self-correction for Mathematical Reasoning
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Chengpeng Li
Guanting Dong
Mingfeng Xue
Ru Peng
Xiang Wang
Dayiheng Liu
LRM
ReLM
26
11
0
04 Jul 2024
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large
  Language Models
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Yiyuan Li
Shichao Sun
Pengfei Liu
LRM
49
0
0
01 Jul 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical
  Reasoning
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
24
22
0
30 Jun 2024
Improving Arithmetic Reasoning Ability of Large Language Models through
  Relation Tuples, Verification and Dynamic Feedback
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback
Zhongtao Miao
Kaiyan Zhao
Yoshimasa Tsuruoka
KELM
LRM
36
2
0
25 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
32
7
0
25 Jun 2024
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world
  Document Analysis
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis
Yulong Hui
Yao Lu
Huanchen Zhang
RALM
33
9
0
21 Jun 2024
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical
  Problem-Solving
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong
Xiwen Zhang
Rui Wang
R. Wu
Junxian He
AIMat
LRM
33
30
0
18 Jun 2024
123
Next