ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.11446
58
1303

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

8 December 2021
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
Francis Song
John Aslanides
Sarah Henderson
Roman Ring
Susannah Young
Eliza Rutherford
Tom Hennigan
Jacob Menick
Albin Cassirer
Richard Powell
George van den Driessche
Lisa Anne Hendricks
Maribeth Rauh
Po-Sen Huang
Amelia Glaese
Johannes Welbl
Sumanth Dathathri
Saffron Huang
J. Uesato
John F. J. Mellor
I. Higgins
Antonia Creswell
Nat McAleese
Amy Wu
Erich Elsen
Siddhant M. Jayakumar
Elena Buchatskaya
David Budden
Esme Sutherland
Karen Simonyan
Michela Paganini
Laurent Sifre
Lena Martens
Xiang Lorraine Li
A. Kuncoro
Aida Nematzadeh
E. Gribovskaya
Domenic Donato
Angeliki Lazaridou
A. Mensch
Jean-Baptiste Lespiau
Maria Tsimpoukelli
N. Grigorev
Doug Fritz
Thibault Sottiaux
Mantas Pajarskas
Tobias Pohlen
Z. Gong
Daniel Toyama
Cyprien de Masson dÁutume
Yujia Li
Tayfun Terzi
Vladimir Mikulik
Igor Babuschkin
Aidan Clark
Diego de Las Casas
Aurelia Guy
Chris Jones
James Bradbury
Matthew J. Johnson
Blake A. Hechtman
Laura Weidinger
Iason Gabriel
William S. Isaac
Edward Lockhart
Simon Osindero
Laura Rimell
Chris Dyer
Oriol Vinyals
Kareem W. Ayoub
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
ArXivPDFHTML
Abstract

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

View on arXiv
Comments on this paper