ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.15796
  4. Cited By
Understanding Emergent Abilities of Language Models from the Loss Perspective

Understanding Emergent Abilities of Language Models from the Loss Perspective

23 March 2024
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
    UQCV
    LRM
ArXivPDFHTML

Papers citing "Understanding Emergent Abilities of Language Models from the Loss Perspective"

44 / 44 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
25
0
0
26 Apr 2025
Trillion 7B Technical Report
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
22
0
0
21 Apr 2025
DataDecide: How to Predict Best Pretraining Data with Small Experiments
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Ian H. Magnusson
Nguyen Tai
Ben Bogin
David Heineman
Jena D. Hwang
...
Dirk Groeneveld
Oyvind Tafjord
Noah A. Smith
Pang Wei Koh
Jesse Dodge
ALM
20
0
0
15 Apr 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
H. Zhang
Jun Wang
61
0
0
15 Mar 2025
Teaching LLMs How to Learn with Contextual Fine-Tuning
Younwoo Choi
Muhammad Adil Asif
Ziwen Han
John Willes
Rahul G. Krishnan
LRM
31
0
0
12 Mar 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
34
0
0
28 Feb 2025
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión?
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión?
Carlos Gómez-Rodríguez
SyDa
AILaw
ELM
VLM
94
0
0
25 Feb 2025
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Chengyin Xu
Kaiyuan Chen
Xiao Li
Ke Shen
Chenggang Li
OffRL
41
0
0
24 Feb 2025
A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles
A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles
Andrea Asperti
Franky George
Tiberio Marras
Razvan Ciprian Stricescu
Fabio Zanotti
EGVM
41
0
0
21 Feb 2025
Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation
Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation
Hao Wang
Wei Guo
L. Zhang
Jin Yao Chin
Yufei Ye
Huifeng Guo
Y. Liu
Defu Lian
Ruiming Tang
Enhong Chen
45
1
0
20 Feb 2025
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Pierre Ablin
Angelos Katharopoulos
Skyler Seto
David Grangier
MoMe
45
0
0
03 Feb 2025
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
J. Zhang
Y. Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRL
LRM
ReLM
49
20
0
20 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Y. Liu
40
0
0
06 Jan 2025
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Predictable Emergent Abilities of LLMs: Proxy Tasks Are All You Need
Bo Zhang
Yan Yan
Boxiang Yang
Yifei Xue
Guang Liu
LRM
69
0
0
10 Dec 2024
Predicting Emergent Capabilities by Finetuning
Predicting Emergent Capabilities by Finetuning
Charlie Snell
Eric Wallace
Dan Klein
Sergey Levine
ELM
LRM
75
5
0
25 Nov 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
72
2
0
19 Nov 2024
Scaling up Masked Diffusion Models on Text
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min-Bin Lin
Chongxuan Li
AI4CE
31
13
0
24 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
41
7
0
11 Oct 2024
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li
Haoran Xu
Weiting Tan
Kenton Murray
Daniel Khashabi
35
1
0
06 Oct 2024
Dynamic neurons: A statistical physics approach for analyzing deep
  neural networks
Dynamic neurons: A statistical physics approach for analyzing deep neural networks
Donghee Lee
Hye-Sung Lee
Jaeok Yi
11
1
0
01 Oct 2024
Quantifying Emergence in Neural Networks: Insights from Pruning and
  Training Dynamics
Quantifying Emergence in Neural Networks: Insights from Pruning and Training Dynamics
Faisal AlShinaifi
Zeyad Almoaigel
Johnny Jingze Li
Abdulla Kuleib
Gabriel A. Silva
19
0
0
03 Sep 2024
Scaling Law with Learning Rate Annealing
Scaling Law with Learning Rate Annealing
Howe Tissue
Venus Wang
Lu Wang
19
4
0
20 Aug 2024
Performance Law of Large Language Models
Performance Law of Large Language Models
Chuhan Wu
Ruiming Tang
LRM
32
2
0
19 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
20
19
0
31 Jul 2024
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme
Johnny Jingze Li
V. George
Gabriel A. Silva
ODL
34
0
0
26 Jul 2024
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual
  Pre-training of Language Models
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
Jiawei Gu
Zacc Yang
Chuanghao Ding
Rui Zhao
Fei Tan
CLL
34
3
0
24 Jul 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical
  Reasoning with Checklist
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
47
2
0
11 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
35
10
0
08 Jul 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
50
52
0
18 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
24
4
0
14 Jun 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training
  Durations
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
30
33
0
28 May 2024
Linguistic Collapse: Neural Collapse in (Large) Language Models
Linguistic Collapse: Neural Collapse in (Large) Language Models
Robert Wu
V. Papyan
27
11
0
28 May 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
  LLM Pre-Training
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Yikang Shen
Reynold Cheng
Yike Guo
Jie Fu
29
4
0
24 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
29
4
0
18 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with
  Associative Memory
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
18
6
0
14 May 2024
Compression Represents Intelligence Linearly
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
31
24
0
15 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
29
275
0
09 Apr 2024
Emergent Abilities in Reduced-Scale Generative Language Models
Emergent Abilities in Reduced-Scale Generative Language Models
Sherin Muckatira
Vijeta Deshpande
Vladislav Lialin
Anna Rumshisky
ReLM
ELM
LRM
16
4
0
02 Apr 2024
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin
Baizhou Huang
Haotian Ye
Qinyu Chen
Zihao Wang
Sujian Li
Jianzhu Ma
Xiaojun Wan
James Y. Zou
Yitao Liang
80
20
0
04 Feb 2024
Paloma: A Benchmark for Evaluating Language Model Fit
Paloma: A Benchmark for Evaluating Language Model Fit
Ian H. Magnusson
Akshita Bhagia
Valentin Hofmann
Luca Soldaini
A. Jha
...
Iz Beltagy
Hanna Hajishirzi
Noah A. Smith
Kyle Richardson
Jesse Dodge
126
21
0
16 Dec 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
237
840
0
05 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1