173

Apple Intelligence Foundation Language Models: Tech Report 2025

Syd Evans
Muyang Yu
Guoli Yin
Yi Qin
Erin Feldman
Isha Garg
Aparna Rajamani
Karla Vega
Walker Cheng
TJ Collins
Hans Han
Raul Rea Menacho
Simon Yeung
Sophy Lee
Phani Mutyala
Ying-Chang Cheng
Zhe Gan
Sprite Chu
Justin Lazarow
Alessandro Pappalardo
Federico Scozzafava
Jing Lu
Erik Daxberger
Laurent Duchesne
Jen Liu
David Güera
Stefano Ligas
Mary Beth Kery
Brent Ramerth
Ciro Sannino
Marcin Eichner
Haoshuo Huang
Rui Qian
Moritz Schwarzer-Becker
David Riazati
Mingfei Gao
Bailin Wang
Jack Cackler
Yang Lu
Ransen Niu
John Dennison
Guillaume Klein
Jeffrey Bigham
Deepak Gopinath
Navid Shiee
Darren Botten
Guillaume Tartavel
Alex Guillen Garcia
Sam Xu
Victoria MönchJuan Haladjian
Zi-Yi Dou
Matthias Paulik
Adolfo Lopez Mendez
Zhen Li
Hong-You Chen
Chao Jia
Dhaval Doshi
Zhengdong Zhang
Raunak Manjani
Aaron Franklin
Zhile Ren
David Chen
Artsiom Peshko
Nandhitha Raghuram
Hans Hao
Jiulong Shan
Kavya Nerella
Ramsey Tantawi
Vivek Kumar
Saiwen Wang
Brycen Wershing
Bhuwan Dhingra
Dhruti Shah
Ob Adaranijo
Xin Zheng
Tait Madsen
Hadas Kotek
Chang Liu
Yin Xia
Hanli Li
Suma Jayaram
Yanchao Sun
Ahmed Fakhry
Vasileios Saveris
Dustin Withers
Yanghao Li
Alp Aygar
Andres Romero Mier Y Teran
Kaiwei Huang
Mark Lee
Xiujun Li
Yuhong Li
Tyler Johnson
Jay Tang
Joseph Yitan Cheng
Futang Peng
Andrew Walkingshaw
Lucas Guibert
Abhishek Sharma
Cheng Shen
Piotr Maj
Yasutaka Tanaka
You-Cyuan Jhang
Vivian Ma
Tommi Vehvilainen
Kelvin Zou
Jeff Nichols
Matthew Lei
David Qiu
Yihao Qian
Gokul Santhanam
Wentao Wu
Yena Han
Dominik Moritz
Haijing Fu
Mingze Xu
Vivek Rathod
Jian Liu
Louis D'hauwe
Qin Ba
Haitian Sun
Haoran Yan
Philipp Dufter
Anh Nguyen
Yihao Feng
Emma Wang
Keyu He
Rahul Nair
Sanskruti Shah
Jiarui Lu
Patrick Sonnenberg
Jeremy Warner
Yuanzhi Li
Bowen Pan
Ziyi Zhong
Joe Zhou
Sam Davarnia
Olli Saarikivi
Irina Belousova
Rachel Burger
Shang-Chen Wu
Di Feng
Bas Straathof
James Chou
Yuanyang Zhang
Marco Zuliani
Eduardo Jimenez
Abhishek Sundararajan
Xianzhi Du
Chang Lan
Nilesh Shahdadpuri
Peter Grasch
Sergiu Sima
Josh Newnham
Varsha Paidi
Jianyu Wang
Kaelen Haag
Alex Braunstein
Daniele Molinari
Richard Wei
Brenda Yang
Nicholas Lusskin
Joanna Arreaza-Taylor
Meng Cao
Nicholas Seidl
Simon Wang
Jiaming Hu
Yiping Ma
Mengyu Li
Kieran Liu
Hang Su
Sachin Ravi
Chong Wang
Xin Wang
Kevin Smith
Haoxuan You
Binazir Karimzadeh
Rui Li
Jinhao Lei
Wei Fang
Alec Doane
Sam Wiseman
Ismael Fernandez
Jane Li
Andrew Hansen
Javier Movellan
Christopher Neubauer
Hanzhi Zhou
Chris Chaney
Nazir Kamaldin
Valentin Wolf
Fernando Bermúdez-Medina
Joris Pelemans
Peter Fu
Howard Xing
Xiang Kong
Wayne Shan
Gabriel Jacoby-Cooper
Dongcai Shen
Tom Gunter
Guillaume Seguin
Fangping Shi
Shiyu Li
Yang Xu
Areeba Kamal
Dan Masi
Saptarshi Guha
Qi Zhu
Jenna Thibodeau
Changyuan Zhang
Rebecca Callahan
Charles Maalouf
Wilson Tsao
Boyue Li
Qingqing Cao
Naomy Sabo
Cheng Leong
Yi Wang
Anupama Mann Anupama
Colorado Reed
Kenneth Jung
Zhifeng Chen
Mohana Prasad Sathya Moorthy
Yifei He
Erik Hornberger
Devi Krishna
Senyu Tong
Michael
David Haldimann
Yang Zhao
Bowen Zhang
Chang Gao
Chris Bartels
Sushma Rao
Nathalie Tran
Simon Lehnerer
Co Giang
Patrick Dong
Junting Pan
Biyao Wang
Dongxu Li
Mehrdad Farajtabar
Dongseong Hwang
Grace Duanmu
Eshan Verma
Sujeeth Reddy
Qi Shan
Hongbin Gao
Nan Du
Pragnya Sridhar
Forrest Huang
Yingbo Wang
Nikhil Bhendawade
Diane Zhu
Sai Aitharaju
Fred Hohman
Lauren Gardiner
Chung-Cheng Chiu
Yinfei Yang
Alper Kokmen
Frank Chu
Ke Ye
Kaan Elgin
Oron Levy
John Park
Donald Zhang
Eldon Schoop
Nina Wenzel
Michael Booker
Hyunjik Kim
Chinguun Erdenebileg
Nan Dun
Eric Liang Yang
Priyal Chhatrapati
Vishaal Mahtani
Haiming Gang
Kohen Chia
Deepa Seshadri
Donghan Yu
Yan Meng
Kelsey Peterson
Zhen Yang
Yongqiang Wang
Carina Peng
Doug Kang
Anuva Agarwal
Albert Antony
Juan Lao Tebar
Albin Madappally Jose
Regan Poston
Andy De Wang
Gerard Casamayor
Elmira Amirloo
Violet Yao
Wojciech Kryscinski
Kun Duan
Lezhi L
et al. (297 additional authors not shown)
Main:19 Pages
5 Figures
Bibliography:4 Pages
3 Tables
Appendix:4 Pages
Abstract

We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transformer that combines track parallelism, mixture-of-experts sparse computation, and interleaved global-local attention to deliver high quality with competitive cost on Apple's Private Cloud Compute platform. Both models are trained on large-scale multilingual and multimodal datasets sourced via responsible web crawling, licensed corpora, and high-quality synthetic data, then further refined with supervised fine-tuning and reinforcement learning on a new asynchronous platform. The resulting models support several additional languages while understanding images and executing tool calls. In public benchmarks and human evaluations, both the server model and the on-device model match or surpass comparably sized open baselines.

View on arXiv
Comments on this paper