ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14387
  4. Cited By
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

22 May 2023
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
    ALM
ArXivPDFHTML

Papers citing "AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback"

50 / 451 papers shown
Title
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
58
0
0
09 May 2025
ICon: In-Context Contribution for Automatic Data Selection
ICon: In-Context Contribution for Automatic Data Selection
Yixin Yang
Qingxiu Dong
Linli Yao
Fangwei Zhu
Zhifang Sui
41
0
0
08 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
53
0
0
05 May 2025
Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Co3^{3}3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
55
0
0
03 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
95
25
1
01 May 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
76
0
0
29 Apr 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction
Y. Chen
Haoran Li
Yuan Sui
Y. Liu
Yufei He
Y. Song
Bryan Hooi
AAML
SILM
61
0
0
29 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
43
1
0
21 Apr 2025
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Tuhina Tripathi
Manya Wadhwa
Greg Durrett
S. Niekum
27
0
0
20 Apr 2025
Energy-Based Reward Models for Robust Language Model Alignment
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
53
0
0
17 Apr 2025
Evaluating the Diversity and Quality of LLM Generated Content
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula
Shuo Li
Botong Zhang
Vishakh Padmakumar
Kayo Yin
Osbert Bastani
34
1
0
16 Apr 2025
CHARM: Calibrating Reward Models With Chatbot Arena Scores
CHARM: Calibrating Reward Models With Chatbot Arena Scores
Xiao Zhu
Chenmien Tan
Pinzhen Chen
Rico Sennrich
Yanlin Zhang
Hanxu Hu
ALM
24
0
0
14 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
38
1
0
12 Apr 2025
Discriminator-Free Direct Preference Optimization for Video Diffusion
Discriminator-Free Direct Preference Optimization for Video Diffusion
Haoran Cheng
Qide Dong
Liang Peng
Zhizhou Sha
Weiguo Feng
Jinghui Xie
Zhao Song
Shilei Wen
Xiaofei He
Boxi Wu
VGen
41
0
0
11 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
39
0
0
10 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Y. Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
72
0
0
09 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Z. Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
29
0
0
08 Apr 2025
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
Kavana Venkatesh
Connor Dunlop
Pinar Yanardag
DiffM
27
0
0
07 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
49
1
0
01 Apr 2025
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
33
0
0
31 Mar 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
66
0
0
29 Mar 2025
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Tuo Liang
Zhe Hu
Jing Li
Hao Zhang
Yiren Lu
...
Yiran Qiao
Disheng Liu
Jeirui Peng
Jing Ma
Yu Yin
44
0
0
29 Mar 2025
Probabilistic Uncertain Reward Model
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
56
0
0
28 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
X. Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
39
1
0
28 Mar 2025
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Beining Xu
Arkaitz Zubiaga
DeLMO
66
0
0
23 Mar 2025
ChatBench: From Static Benchmarks to Human-AI Evaluation
ChatBench: From Static Benchmarks to Human-AI Evaluation
Serina Chang
Ashton Anderson
Jake M. Hofman
ELM
AI4MH
57
2
0
22 Mar 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
37
0
0
21 Mar 2025
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
Ling-an Zeng
Guohong Huang
Yi-Lin Wei
Shengbo Gu
Yu-Ming Tang
Jingke Meng
Wei-Shi Zheng
51
2
0
17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
47
0
0
17 Mar 2025
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
Jia Zhang
Chen-Xi Zhang
Yao Liu
Yi-Xuan Jin
Xiao-Wen Yang
Bo Zheng
Y. Liu
Lan-Zhe Guo
47
2
0
14 Mar 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
Egor Zverev
Evgenii Kortukov
Alexander Panfilov
Soroush Tabesh
Alexandra Volkova
Sebastian Lapuschkin
Wojciech Samek
Christoph H. Lampert
AAML
52
1
0
13 Mar 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
53
3
0
11 Mar 2025
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
71
0
0
10 Mar 2025
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Fenglu Hong
Ravi Raju
Jonathan Li
Bo Li
Urmish Thakker
Avinash Ravichandran
Swayambhoo Jain
Changran Hu
35
0
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
71
0
0
10 Mar 2025
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Nguyen H K. Do
Truc Nguyen
Malik Hassanaly
Raed Alharbi
Jung Taek Seo
My T. Thai
41
0
0
09 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALM
ELM
45
1
0
07 Mar 2025
TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
Hejie Cui
Alyssa Unell
Bowen Chen
Jason Alan Fries
Emily Alsentzer
Sanmi Koyejo
N. Shah
79
0
0
06 Mar 2025
Effectively Steer LLM To Follow Preference via Building Confident Directions
Bingqing Song
Boran Han
Shuai Zhang
Hao Wang
Haoyang Fang
Bonan Min
Yuyang Wang
Mingyi Hong
LLMSV
47
0
0
04 Mar 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
38
0
0
03 Mar 2025
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Y. Wang
Pei Zhang
Siyuan Huang
Baosong Yang
Z. Zhang
Fei Huang
Rui Wang
BDL
LRM
62
6
0
03 Mar 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
93
0
0
26 Feb 2025
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
Anikait Singh
Sheryl Hsu
Kyle Hsu
E. Mitchell
Stefano Ermon
Tatsunori Hashimoto
Archit Sharma
Chelsea Finn
SyDa
OffRL
57
1
0
26 Feb 2025
Self-rewarding correction for mathematical reasoning
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
64
9
0
26 Feb 2025
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Xu Chu
Zhixin Zhang
Tianyu Jia
Yujie Jin
72
0
0
25 Feb 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
45
0
0
24 Feb 2025
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Simeng Han
Frank Palma Gomez
Tu Vu
Zefei Li
Daniel Matthew Cer
Hansi Zeng
Chris Tar
Arman Cohan
Gustavo Hernández Ábrego
40
1
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Q. Liu
James T. Kwok
MoE
96
9
0
20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
60
23
0
20 Feb 2025
1234...8910
Next