Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.11399
Cited By
Transcending Scaling Laws with 0.1% Extra Compute
20 October 2022
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
Siamak Shakeri
Xavier Garcia
H. Zheng
J. Rao
Aakanksha Chowdhery
Denny Zhou
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transcending Scaling Laws with 0.1% Extra Compute"
50 / 58 papers shown
Title
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs?
Hansi Zeng
Kai Hui
Honglei Zhuang
Zhen Qin
Zhenrui Yue
Hamed Zamani
Dana Alon
33
0
0
16 Apr 2025
CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics
Kai Yin
Chengkai Liu
Ali Mostafavi
Xia Hu
49
8
0
17 Jan 2025
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
68
12
0
04 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
26
1
0
23 Oct 2024
Preference Optimization with Multi-Sample Comparisons
Chaoqi Wang
Zhuokai Zhao
Chen Zhu
Karthik Abinav Sankararaman
Michal Valko
...
Zhaorun Chen
Madian Khabsa
Yuxin Chen
Hao Ma
Sinong Wang
57
10
0
16 Oct 2024
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
89
13
0
06 Sep 2024
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
32
6
0
28 Aug 2024
A Survey of Large Language Models for European Languages
Wazir Ali
S. Pyysalo
39
2
0
27 Aug 2024
A Survey on Symbolic Knowledge Distillation of Large Language Models
Kamal Acharya
Alvaro Velasquez
H. Song
SyDa
29
4
0
12 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
34
12
0
06 Jul 2024
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Linqing Chen
Weilei Wang
Zilong Bai
Peng Xu
Yan Fang
...
Lisha Zhang
Fu Bian
Zhongkai Ye
Lidong Pei
Changyang Tu
AI4MH
LM&MA
40
2
0
26 Jun 2024
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
17
0
0
12 Jun 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
112
347
0
09 Feb 2024
KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion
Yanbin Wei
Qiushi Huang
James T. Kwok
Yu Zhang
20
33
0
04 Feb 2024
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
22
61
0
19 Jan 2024
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen
Xingyi Cheng
Pan Li
Yangli-ao Geng
Jing Gong
...
Chiming Liu
Aohan Zeng
Yuxiao Dong
Jie Tang
Leo T. Song
22
98
0
11 Jan 2024
How predictable is language model benchmark performance?
David Owen
ELM
LRM
12
19
0
09 Jan 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
8
6
0
22 Dec 2023
Self-Infilling Code Generation
Lin Zheng
Jianbo Yuan
Zhi Zhang
Hongxia Yang
Lingpeng Kong
18
0
0
29 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
22
6
0
21 Nov 2023
Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models
Sumuk Shashidhar
Abhinav Chinta
Vaibhav Sahai
Zhenhailong Wang
Heng Ji
42
8
0
11 Oct 2023
Self-Supervised Open-Ended Classification with Small Visual Language Models
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
Yuki M. Asano
VLM
14
0
0
30 Sep 2023
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Haoran Xu
Young Jin Kim
Amr Sharaf
Hany Awadalla
24
56
0
20 Sep 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
23
22
0
10 Aug 2023
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill
N. Tan
Michael Witbrock
Patricia J. Riddle
ReLM
KELM
LRM
13
2
0
02 Aug 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
22
60
0
16 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
499
0
12 Jul 2023
Teaching Arithmetic to Small Transformers
Nayoung Lee
Kartik K. Sreenivasan
Jason D. Lee
Kangwook Lee
Dimitris Papailiopoulos
LRM
12
50
0
07 Jul 2023
Improving Retrieval-Augmented Large Language Models via Data Importance Learning
Xiaozhong Lyu
Stefan Grafberger
Samantha Biegel
Shaopeng Wei
Meng Cao
Sebastian Schelter
Ce Zhang
RALM
8
14
0
06 Jul 2023
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
13
112
0
25 May 2023
Multilingual Large Language Models Are Not (Yet) Code-Switchers
Ruochen Zhang
Samuel Cahyawijaya
Jan Christian Blaise Cruz
Genta Indra Winata
Alham Fikri Aji
LRM
28
49
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
30
96
0
23 May 2023
Evaluation of medium-large Language Models at zero-shot closed book generative question answering
René Peinl
Johannes Wirth
ELM
10
5
0
19 May 2023
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
Jinhao Jiang
Kun Zhou
Zican Dong
Keming Ye
Wayne Xin Zhao
Ji-Rong Wen
LRM
LMTD
RALM
39
255
0
16 May 2023
Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2 into a Robot Language Model for Grounded Task Planning
Georgia Chalvatzaki
A. Younes
Daljeet Nandha
An T. Le
Leonardo F. R. Ribeiro
Iryna Gurevych
LM&Ro
LRM
LLMAG
22
30
0
12 May 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
141
163
0
03 May 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
4
1
0
24 Mar 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
19
15
0
17 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
15
116
0
31 Jan 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
13
611
0
31 Jan 2023
Inconsistencies in Masked Language Models
Tom Young
Yunan Chen
Yang You
16
2
0
30 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
27
2,297
0
09 Nov 2022
Can language models handle recursively nested grammatical structures? A case study on comparing models and humans
Andrew Kyle Lampinen
ReLM
ELM
19
28
0
27 Oct 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
51
2,959
0
20 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
165
320
0
06 Oct 2022
Compositional Semantic Parsing with Large Language Models
Andrew Drozdov
Nathanael Scharli
Ekin Akyuurek
Nathan Scales
Xinying Song
Xinyun Chen
Olivier Bousquet
Denny Zhou
ReLM
LRM
187
91
0
29 Sep 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
22
2,309
0
15 Jun 2022
A Study on Transformer Configuration and Training Objective
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
12
7
0
21 May 2022
1
2
Next