ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.09689
  4. Cited By
Unnatural Instructions: Tuning Language Models with (Almost) No Human
  Labor

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

19 December 2022
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
    ALM
ArXivPDFHTML

Papers citing "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"

50 / 291 papers shown
Title
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
14
0
0
12 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
14
0
0
10 May 2025
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$-$β$-Divergence
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via ααα-βββ-Divergence
Guanghui Wang
Zhiyong Yang
Z. Wang
Shi Wang
Qianqian Xu
Q. Huang
37
0
0
07 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
50
0
0
29 Apr 2025
AndroidGen: Building an Android Language Agent under Data Scarcity
AndroidGen: Building an Android Language Agent under Data Scarcity
Hanyu Lai
Junjie Gao
Xiao-Yang Liu
Y. Xu
S. Zhang
Yuxiao Dong
Jie Tang
LLMAG
72
0
0
27 Apr 2025
Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation
Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation
Jagrit Acharya
Gouri Ginde
36
0
0
26 Apr 2025
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
X. Zhang
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
17
0
0
15 Apr 2025
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking
Zihan Gu
Ruoyu Chen
Hua Zhang
Yue Hu
Xiaochun Cao
29
0
0
04 Apr 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
66
0
0
29 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
73
1
0
25 Mar 2025
Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages
Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages
Nick McKenna
X. Xu
Jack Williams
Nick Wilson
Benjamin Van Durme
Christian Poelitz
29
0
0
24 Mar 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Anshumann
Mohd Abbas Zaidi
Akhil Kedia
Jinwoo Ahn
Taehwak Kwon
Kangwook Lee
Haejun Lee
Joohyung Lee
FedML
77
0
0
21 Mar 2025
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Mihai Nadas
Laura Diosan
Andreea Tomescu
SyDa
67
0
0
18 Mar 2025
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
Jia Zhang
Chen-Xi Zhang
Yao Liu
Yi-Xuan Jin
Xiao-Wen Yang
Bo Zheng
Y. Liu
Lan-Zhe Guo
47
2
0
14 Mar 2025
Biases in Large Language Model-Elicited Text: A Case Study in Natural Language Inference
Grace Proebsting
Adam Poliak
40
0
0
06 Mar 2025
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Tong Zhang
Yu-Shi Zhu
Heyan Huang
47
0
0
05 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
67
0
0
03 Mar 2025
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation
Jie He
Jennifer Neville
Mengting Wan
Longqi Yang
Hui Liu
Xiaofeng Xu
Xia Song
Jeff Z. Pan
Pei Zhou
LLMAG
SyDa
58
0
0
26 Feb 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
54
0
0
18 Feb 2025
DeepThink: Aligning Language Models with Domain-Specific User Intents
DeepThink: Aligning Language Models with Domain-Specific User Intents
Yang Li
Mingxuan Luo
Yeyun Gong
Chen Lin
Jian Jiao
Yi Liu
Kaili Huang
LRM
ALM
ELM
45
0
0
08 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
82
148
0
28 Jan 2025
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains
Ran Xu
Hui Liu
Sreyashi Nag
Zhenwei Dai
Yaochen Xie
...
Chen Luo
Yang Li
Joyce C. Ho
Carl Yang
Qi He
RALM
68
8
0
28 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
74
2
0
10 Jan 2025
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng
Cheng-Yu Hsieh
Vivek Ramanujan
Matthew Wallingford
Chun-Liang Li
Pang Wei Koh
Ranjay Krishna
DiffM
55
6
0
03 Jan 2025
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
33
3
0
31 Dec 2024
NILE: Internal Consistency Alignment in Large Language Models
NILE: Internal Consistency Alignment in Large Language Models
Minda Hu
Qiyuan Zhang
Yufei Wang
Bowei He
Hongru Wang
Jingyan Zhou
Liangyou Li
Yasheng Wang
Chen-li Ma
Irwin King
81
0
0
21 Dec 2024
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small
  LLMs
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
78
6
0
17 Dec 2024
PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection
PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection
Sepideh Mamooler
Syrielle Montariol
Alexander Mathis
Antoine Bosselut
78
1
0
16 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
89
0
0
01 Dec 2024
On Domain-Specific Post-Training for Multimodal Large Language Models
On Domain-Specific Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
87
2
0
29 Nov 2024
Forecasting Future International Events: A Reliable Dataset for
  Text-Based Event Modeling
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling
Daehoon Gwak
Junwoo Park
Minho Park
C. Park
Hyunchan Lee
E. Choi
Jaegul Choo
64
0
0
21 Nov 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
35
0
0
13 Nov 2024
LIFBench: Evaluating the Instruction Following Performance and Stability
  of Large Language Models in Long-Context Scenarios
LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
Xiaodong Wu
Minhao Wang
Yichen Liu
Xiaoming Shi
He Yan
Xiangju Lu
Junmin Zhu
Wei Zhang
70
3
0
11 Nov 2024
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from
  Foundation Models
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
Yeming Wen
Swarat Chaudhuri
26
0
0
11 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for
  Improved Prompt Engineering
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
Ishika Joshi
Simra Shahid
Shreeya Venneti
Manushree Vasu
Yantao Zheng
Yunyao Li
Balaji Krishnamurthy
Gromit Yeuk-Yin Chan
22
3
0
09 Nov 2024
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset
  for Security Research
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security Research
Sian-Yao Huang
Cheng-Lin Yang
C. Lin
Chun-Ying Huang
26
1
0
02 Nov 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
58
3
0
30 Oct 2024
Rethinking Data Synthesis: A Teacher Model Training Recipe with
  Interpretation
Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation
Yifang Chen
David Zhu
SyDa
30
0
0
27 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
27
0
0
25 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through
  Failure-Inducing Exploration
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration
Qintong Li
Jiahui Gao
Sheng Wang
Renjie Pi
Xueliang Zhao
Chuan Wu
Xin Jiang
Z. Li
Lingpeng Kong
SyDa
23
2
0
22 Oct 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
18
0
0
20 Oct 2024
IterSelectTune: An Iterative Training Framework for Efficient
  Instruction-Tuning Data Selection
IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection
Jielin Song
Siyu Liu
Bin Zhu
Yanghui Rao
25
2
0
17 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Z. Liu
Shiwei Li
...
Chenkai Zhang
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
36
15
0
16 Oct 2024
Self-Boosting Large Language Models with Synthetic Preference Data
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong
Li Dong
Xingxing Zhang
Zhifang Sui
Furu Wei
SyDa
34
1
0
09 Oct 2024
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric
  Understanding
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
Keliang Li
Zaifei Yang
Jiahe Zhao
Hongze Shen
Ruibing Hou
Hong Chang
Shiguang Shan
Xilin Chen
VLM
26
0
0
09 Oct 2024
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu
Shujian Zhang
Kaiqiang Song
Silei Xu
Sanqiang Zhao
Ravi Agrawal
Sathish Indurthi
Chong Xiang
Prateek Mittal
Wenxuan Zhou
37
7
0
09 Oct 2024
On Instruction-Finetuning Neural Machine Translation Models
On Instruction-Finetuning Neural Machine Translation Models
Vikas Raunak
Roman Grundkiewicz
Marcin Junczys-Dowmunt
18
1
0
07 Oct 2024
Data Advisor: Dynamic Data Curation for Safety Alignment of Large
  Language Models
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
Fei Wang
Ninareh Mehrabi
Palash Goyal
Rahul Gupta
Kai-Wei Chang
Aram Galstyan
ALM
29
0
0
07 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
58
3
0
04 Oct 2024
123456
Next