ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.06786
  4. Cited By
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

10 October 2023
Keiran Paster
Marco Dos Santos
Zhangir Azerbayev
Jimmy Ba
    LRM
ArXivPDFHTML

Papers citing "OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text"

50 / 64 papers shown
Title
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Y. Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Y. Wang
Xin Jiang
Lifeng Shang
R. Tang
W. Wang
29
0
0
22 Apr 2025
Lugha-Llama: Adapting Large Language Models for African Languages
Lugha-Llama: Adapting Large Language Models for African Languages
Happy Buzaaba
Alexander Wettig
David Ifeoluwa Adelani
Christiane Fellbaum
28
0
0
09 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
46
1
0
03 Apr 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin
Lei Ji
Xiao Liu
Yeyun Gong
52
0
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
40
0
0
22 Mar 2025
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models
J. Li
Lu Yu
Qing Cui
Zhiqiang Zhang
Jun Zhou
Yanfang Ye
Chuxu Zhang
59
0
0
19 Mar 2025
Teaching LLMs How to Learn with Contextual Fine-Tuning
Younwoo Choi
Muhammad Adil Asif
Ziwen Han
John Willes
Rahul G. Krishnan
LRM
36
0
0
12 Mar 2025
Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Wenhui Zhang
Huiyu Xu
Zhibo Wang
Zeqing He
Ziqi Zhu
Kui Ren
AAML
PILM
67
0
0
09 Mar 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
CritiQ: Mining Data Quality Criteria from Human Preferences
Honglin Guo
Kai Lv
Qipeng Guo
Tianyi Liang
Zhiheng Xi
...
Qiuyinzhe Zhang
Y. Sun
K. Chen
Xipeng Qiu
Tao Gui
33
0
0
26 Feb 2025
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps
Yen-Che Hsiao
Abhishek Dutta
LRM
ReLM
ELM
54
0
0
24 Feb 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Minxuan Lv
Zhenpeng Su
Leiyu Pan
Yizhe Xiong
Zijia Lin
...
Guiguang Ding
Cheng Luo
Di Zhang
Kun Gai
Songlin Hu
MoE
39
0
0
18 Feb 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Fei Wu
Hongxia Yang
LRM
49
1
0
17 Feb 2025
Small Models Struggle to Learn from Strong Reasoners
Small Models Struggle to Learn from Strong Reasoners
Yuetai Li
Xiang Yue
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Bhaskar Ramasubramanian
Radha Poovendran
LRM
44
12
0
17 Feb 2025
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Yuchen Yan
Yongliang Shen
Yang Liu
Jin Jiang
Xin Xu
M. Zhang
Jian Shao
Yueting Zhuang
ReLM
LRM
51
2
0
17 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
100
15
0
17 Feb 2025
\Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
\Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Ilia Karmanov
A. Deshmukh
Lukas Voegtle
Philipp Fischer
Kateryna Chumachenko
...
Jarno Seppänen
Jupinder Parmar
Joseph Jennings
Andrew Tao
Karan Sapra
68
0
0
06 Feb 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
104
0
0
03 Feb 2025
NExtLong: Toward Effective Long-Context Training without Long Documents
NExtLong: Toward Effective Long-Context Training without Long Documents
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
64
1
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
106
135
0
22 Jan 2025
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
J. Zhang
Y. Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRL
LRM
ReLM
55
20
0
20 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
W. Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
79
12
0
03 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
88
12
0
31 Dec 2024
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
82
21
0
20 Dec 2024
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase
  Pretraining
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
Steven Feng
Shrimai Prabhumoye
Kezhi Kong
Dan Su
M. Patwary
M. Shoeybi
Bryan Catanzaro
67
2
0
18 Dec 2024
Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection
  in Language Models
Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models
Minki Kang
Sung Ju Hwang
Gibbeum Lee
Jaewoong Cho
KELM
32
0
0
01 Nov 2024
STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document
  Parsing
STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing
Jiaru Zou
Qing Wang
Pratyush Thakur
Nickvash Kani
LRM
38
2
0
01 Nov 2024
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM
  Training in Proof Generation
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation
Chenyang An
Shima Imani
Feng Yao
Chengyu Dong
Ali Abbasi
...
Samuel Buss
Jingbo Shang
Gayathri Mahalingam
Pramod Sharma
Maurice Diesendruck
LRM
31
1
0
30 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
ToW: Thoughts of Words Improve Reasoning in Large Language Models
ToW: Thoughts of Words Improve Reasoning in Large Language Models
Zhikun Xu
Ming shen
Jacob Dineen
Zhaonan Li
Xiao Ye
Shijie Lu
Aswin Rrv
Chitta Baral
Ben Zhou
LRM
109
1
0
21 Oct 2024
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
B. Wang
Weipeng Chen
Ji-Rong Wen
63
0
0
10 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
37
0
03 Oct 2024
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
Dongwei Jiang
Guoxuan Wang
Yining Lu
Andrew Wang
Jingyu Zhang
Chuyu Liu
Benjamin Van Durme
Daniel Khashabi
ReLM
LRM
26
3
0
01 Oct 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRM
AI4CE
23
13
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
M. Zhang
Xunliang Cai
Yixin Cao
Liangcai Gao
Zhi Tang
LRM
40
3
0
19 Sep 2024
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text
  Quality Filtering in Large Web Corpora
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Yungi Kim
Hyunsoo Ha
Sukyung Lee
Jihoo Kim
Seonghoon Yang
Chanjun Park
26
0
0
15 Sep 2024
Flexible and Effective Mixing of Large Language Models into a Mixture of
  Domain Experts
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts
Rhui Dih Lee
L. Wynter
R. Ganti
MoE
39
1
0
30 Aug 2024
Solving for X and Beyond: Can Large Language Models Solve Complex Math
  Problems with More-Than-Two Unknowns?
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?
Kuei-Chun Kao
Ruochen Wang
Cho-Jui Hsieh
ELM
LRM
32
3
0
06 Jul 2024
DotaMath: Decomposition of Thought with Code Assistance and
  Self-correction for Mathematical Reasoning
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Chengpeng Li
Guanting Dong
Mingfeng Xue
Ru Peng
Xiang Wang
Dayiheng Liu
LRM
ReLM
28
11
0
04 Jul 2024
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large
  Language Models
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Yiyuan Li
Shichao Sun
Pengfei Liu
LRM
49
0
0
01 Jul 2024
Task Oriented In-Domain Data Augmentation
Task Oriented In-Domain Data Augmentation
Xiao Liang
Xinyu Hu
Simiao Zuo
Yeyun Gong
Qiang Lou
Yi Liu
Shao-Lun Huang
Jian Jiao
37
2
0
24 Jun 2024
HARE: HumAn pRiors, a key to small language model Efficiency
HARE: HumAn pRiors, a key to small language model Efficiency
Lingyun Zhang
Bin jin
Gaojian Ge
Lunhui Liu
Xuewen Shen
Mingyong Wu
Houqian Zhang
Yongneng Jiang
Shiqi Chen
Shi Pu
ALM
38
0
0
17 Jun 2024
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Jikun Kang
Xin Zhe Li
Xi Chen
Amirreza Kazemi
Qianyi Sun
...
Xu He
Quan He
Feng Wen
Jianye Hao
Jun Yao
LRM
ReLM
29
14
0
25 May 2024
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training
  Small Data Synthesis Models
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
Kun Zhou
Beichen Zhang
Jiapeng Wang
Zhipeng Chen
Wayne Xin Zhao
Jing Sha
Zhichao Sheng
Shijin Wang
Ji-Rong Wen
SyDa
LRM
33
29
0
23 May 2024
LoRA Learns Less and Forgets Less
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
30
110
0
15 May 2024
Granite Code Models: A Family of Open Foundation Models for Code
  Intelligence
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Mayank Mishra
Matt Stallone
Gaoyuan Zhang
Yikang Shen
Aditya Prasad
...
Amith Singhee
Nirmit Desai
David D. Cox
Ruchir Puri
Rameswar Panda
AI4TS
46
54
0
07 May 2024
MAmmoTH2: Scaling Instructions from the Web
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
41
85
0
06 May 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
36
5
0
16 Apr 2024
A Survey on Deep Learning for Theorem Proving
A Survey on Deep Learning for Theorem Proving
Zhaoyu Li
Jialiang Sun
Logan Murphy
Qidong Su
Zenan Li
Xian Zhang
Kaiyu Yang
Xujie Si
LRM
42
21
0
15 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDa
EgoV
41
85
0
11 Apr 2024
12
Next