Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.02737
Cited By
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
4 February 2025
Loubna Ben Allal
Anton Lozhkov
Elie Bakouch
Gabriel Martín Blázquez
Guilherme Penedo
Lewis Tunstall
Andrés Marafioti
Hynek Kydlícek
Agustín Piqueres Lajarín
Vaibhav Srivastav
Joshua Lochner
Caleb Fahlgren
Xuan-Son Nguyen
Clémentine Fourrier
Ben Burtenshaw
Hugo Larcher
Haojun Zhao
Cyril Zakka
Mathieu Morlon
Colin Raffel
Leandro von Werra
Thomas Wolf
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model"
35 / 35 papers shown
Title
Revela: Dense Retriever Learning via Language Modeling
Fengyu Cai
Tong Chen
Xinran Zhao
Sihao Chen
Hongming Zhang
Sherry Tongshuang Wu
Iryna Gurevych
Heinz Koeppl
RALM
VLM
18
0
0
19 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
18
0
0
14 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
24
0
0
09 Jun 2025
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao
Massimo Roberto Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Shen Ng
...
Junyan Wang
Zheyuan Liu
Daniel J. Beutel
Lingjuan Lyu
Nicholas D. Lane
ALM
52
1
0
03 Jun 2025
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
Wenhao Liu
Zhenyi Lu
Xinyu Hu
Jierui Zhang
Dailin Li
...
Pei Zhang
Chengbo Zhang
Yuxiang Ren
Xiaohong Huang
Yan Ma
OffRL
58
1
0
02 Jun 2025
K-order Ranking Preference Optimization for Large Language Models
Shihao Cai
Chongming Gao
Yang Zhang
Wentao Shi
Jizhi Zhang
Keqin Bao
Qifan Wang
Fuli Feng
ALM
43
0
0
31 May 2025
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu
Shuo Wang
Xu Han
Zhenghao Liu
Huijia Wu
P. Li
Zhiyuan Liu
Maosong Sun
Zhaofeng He
LRM
493
1
0
30 May 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
86
0
0
28 May 2025
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
Abdellah El Mekki
Houdaifa Atou
Omer Nacar
Shady Shehata
Muhammad Abdul-Mageed
69
0
0
23 May 2025
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
Issey Sukeda
Takuro Fujii
Kosei Buma
Shunsuke Sasaki
Shinnosuke Ono
ELM
74
1
0
22 May 2025
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
Bohan Zhou
Yi Zhan
Zhongbin Zhang
Zongqing Lu
70
0
0
22 May 2025
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
89
1
0
19 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
145
0
0
18 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
91
2
0
15 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
83
0
0
12 May 2025
FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
Sanghyeon Park
Soo-Mook Moon
74
0
0
07 May 2025
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li
Yixiao Liu
Haoqin Tu
Hongru Zhu
Cihang Xie
VLM
440
2
0
07 May 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
124
6
0
20 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
120
0
0
19 Apr 2025
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
94
0
0
14 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
127
1
0
03 Apr 2025
UNDO: Understanding Distillation as Optimization
Kushal Kumar Jain
Piyushi Goyal
Kumar Shridhar
93
0
0
03 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
Presented at
ResearchTrend Connect | VLM
on
04 Jun 2025
172
6
0
01 Apr 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
156
68
0
18 Mar 2025
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde
Tassilo Wald
Tobias Schumacher
Klaus H. Maier-Hein
Markus Strohmaier
Adriana Iamnitchi
AI4TS
VLM
236
6
0
13 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
113
4
0
11 Mar 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
212
0
0
27 Feb 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
135
2
0
26 Feb 2025
On Pruning State-Space LLMs
Tamer Ghattas
Michael Hassid
Roy Schwartz
94
2
0
26 Feb 2025
Machine-generated text detection prevents language model collapse
George Drayson
Emine Yilmaz
Vasileios Lampos
DeLMO
235
1
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
95
3
0
19 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
189
0
0
16 Feb 2025
Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
Jialu Tang
Tong Xia
Yuan Lu
Cecilia Mascolo
Aaqib Saeed
AI4MH
102
3
0
18 Oct 2024
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
167
0
0
02 Sep 2024
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
183
75
0
10 May 2023
1