ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.02737
  4. Cited By
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

4 February 2025
Loubna Ben Allal
Anton Lozhkov
Elie Bakouch
Gabriel Martín Blázquez
Guilherme Penedo
Lewis Tunstall
Andrés Marafioti
Hynek Kydlícek
Agustín Piqueres Lajarín
Vaibhav Srivastav
Joshua Lochner
Caleb Fahlgren
Xuan-Son Nguyen
Clémentine Fourrier
Ben Burtenshaw
Hugo Larcher
Haojun Zhao
Cyril Zakka
Mathieu Morlon
Colin Raffel
Leandro von Werra
Thomas Wolf
    MoE
ArXiv (abs)PDFHTML

Papers citing "SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model"

35 / 35 papers shown
Title
Revela: Dense Retriever Learning via Language Modeling
Revela: Dense Retriever Learning via Language Modeling
Fengyu Cai
Tong Chen
Xinran Zhao
Sihao Chen
Hongming Zhang
Sherry Tongshuang Wu
Iryna Gurevych
Heinz Koeppl
RALMVLM
20
0
0
19 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
20
0
0
14 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
28
0
0
09 Jun 2025
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao
Massimo Roberto Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Shen Ng
...
Junyan Wang
Zheyuan Liu
Daniel J. Beutel
Lingjuan Lyu
Nicholas D. Lane
ALM
54
1
0
03 Jun 2025
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
Wenhao Liu
Zhenyi Lu
Xinyu Hu
Jierui Zhang
Dailin Li
...
Pei Zhang
Chengbo Zhang
Yuxiang Ren
Xiaohong Huang
Yan Ma
OffRL
60
1
0
02 Jun 2025
K-order Ranking Preference Optimization for Large Language Models
K-order Ranking Preference Optimization for Large Language Models
Shihao Cai
Chongming Gao
Yang Zhang
Wentao Shi
Jizhi Zhang
Keqin Bao
Qifan Wang
Fuli Feng
ALM
45
0
0
31 May 2025
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu
Shuo Wang
Xu Han
Zhenghao Liu
Huijia Wu
P. Li
Zhiyuan Liu
Maosong Sun
Zhaofeng He
LRM
493
1
0
30 May 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
88
0
0
28 May 2025
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
Abdellah El Mekki
Houdaifa Atou
Omer Nacar
Shady Shehata
Muhammad Abdul-Mageed
69
0
0
23 May 2025
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
Issey Sukeda
Takuro Fujii
Kosei Buma
Shunsuke Sasaki
Shinnosuke Ono
ELM
76
1
0
22 May 2025
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
Bohan Zhou
Yi Zhan
Zhongbin Zhang
Zongqing Lu
70
0
0
22 May 2025
Krikri: Advancing Open Large Language Models for Greek
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
93
1
0
19 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
145
0
0
18 May 2025
Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoELRM
91
2
0
15 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
85
0
0
12 May 2025
FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
Sanghyeon Park
Soo-Mook Moon
74
0
0
07 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Yixiao Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
440
2
0
07 May 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
124
6
0
20 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
120
0
0
19 Apr 2025
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
94
0
0
14 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
127
1
0
03 Apr 2025
UNDO: Understanding Distillation as Optimization
UNDO: Understanding Distillation as Optimization
Kushal Kumar Jain
Piyushi Goyal
Kumar Shridhar
93
0
0
03 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIPVLM
Presented at ResearchTrend Connect | VLM on 04 Jun 2025
177
6
0
01 Apr 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
156
68
0
18 Mar 2025
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde
Tassilo Wald
Tobias Schumacher
Klaus H. Maier-Hein
Markus Strohmaier
Adriana Iamnitchi
AI4TSVLM
236
6
0
13 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
113
4
0
11 Mar 2025
Mixtera: A Data Plane for Foundation Model Training
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
212
0
0
27 Feb 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
135
2
0
26 Feb 2025
On Pruning State-Space LLMs
On Pruning State-Space LLMs
Tamer Ghattas
Michael Hassid
Roy Schwartz
94
2
0
26 Feb 2025
Machine-generated text detection prevents language model collapse
Machine-generated text detection prevents language model collapse
George Drayson
Emine Yilmaz
Vasileios Lampos
DeLMO
237
1
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
95
3
0
19 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
189
0
0
16 Feb 2025
Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
Jialu Tang
Tong Xia
Yuan Lu
Cecilia Mascolo
Aaqib Saeed
AI4MH
102
3
0
18 Oct 2024
Masked Mixers for Language Generation and Retrieval
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
167
0
0
02 Sep 2024
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
183
75
0
10 May 2023
1