54

Ministral 3

Alexander H. Liu
Kartik Khandelwal
Sandeep Subramanian
Victor Jouault
Abhinav Rastogi
Adrien Sadé
Alan Jeffares
Albert Jiang
Alexandre Cahill
Alexandre Gavaudan
Alexandre Sablayrolles
Amélie Héliou
Amos You
Andy Ehrenberg
Andy Lo
Anton Eliseev
Antonia Calvi
Avinash Sooriyarachchi
Baptiste Bout
Baptiste Rozière
Baudouin De Monicault
Clémence Lanfranchi
Corentin Barreau
Cyprien Courtot
Daniele Grattarola
Darius Dabert
Diego de las Casas
Elliot Chane-Sane
Faruk Ahmed
Gabrielle Berrada
Gaëtan Ecrepont
Gauthier Guinet
Georgii Novikov
Guillaume Kunsch
Guillaume Lample
Guillaume Martin
Gunshi Gupta
Jan Ludziejewski
Jason Rute
Joachim Studnia
Jonas Amar
Joséphine Delas
Josselin Somerville Roberts
Karmesh Yadav
Khyathi Chandu
Kush Jain
Laurence Aitchison
Laurent Fainsin
Léonard Blier
Lingxiao Zhao
Louis Martin
Lucile Saulnier
Luyu Gao
Maarten Buyl
Margaret Jennings
Marie Pellat
Mark Prins
Mathieu Poirée
Mathilde Guillaumin
Matthieu Dinot
Matthieu Futeral
Maxime Darrin
Maximilian Augustin
Mia Chiquier
Michel Schimpf
Nathan Grinsztajn
Neha Gupta
Nikhil Raghuraman
Olivier Bousquet
Olivier Duchenne
Patricia Wang
Patrick von Platen
Paul Jacob
Paul Wambergue
Paula Kurylowicz
Pavankumar Reddy Muddireddy
Philomène Chagniot
Pierre Stock
Pravesh Agrawal
Quentin Torroba
Romain Sauvestre
Roman Soletskyi
Rupert Menneer
Sagar Vaze
Samuel Barry
Sanchit Gandhi
Siddhant Waghjale
Siddharth Gandhi
Soham Ghosh
Srijan Mishra
Sumukh Aithal
Szymon Antoniak
Teven Le Scao
Théo Cachet
Theo Simon Sorg
Thibaut Lavril
Thiziri Nait Saada
Thomas Chabal
Thomas Foubert
Thomas Robert
Main:11 Pages
6 Figures
Bibliography:3 Pages
5 Tables
Abstract

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

View on arXiv
Comments on this paper