ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01257
  4. Cited By
HelpSteer2-Preference: Complementing Ratings with Preferences

HelpSteer2-Preference: Complementing Ratings with Preferences

2 October 2024
Zhilin Wang
Alexander Bukharin
Olivier Delalleau
Daniel Egert
Gerald Shen
Jiaqi Zeng
Oleksii Kuchaiev
Yi Dong
    ALM
ArXivPDFHTML

Papers citing "HelpSteer2-Preference: Complementing Ratings with Preferences"

30 / 30 papers shown
Title
Mapping the Italian Telegram Ecosystem: Communities, Toxicity, and Hate Speech
Mapping the Italian Telegram Ecosystem: Communities, Toxicity, and Hate Speech
Lorenzo Alvisi
S. Tardelli
Maurizio Tesconi
37
0
0
28 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
22
0
0
19 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
38
0
0
17 Apr 2025
FLIP Reasoning Challenge
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
67
0
0
16 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Z. Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
27
0
0
08 Apr 2025
NoveltyBench: Evaluating Language Models for Humanlike Diversity
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang
Harshita Diddee
Susan Holm
Hanchen Liu
Xinyue Liu
Vinay Samuel
Barry Wang
Daphne Ippolito
29
1
0
07 Apr 2025
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He
Wenbin Zhang
Jiaxi Song
Cheng Qian
Z. Fu
...
Hui Xue
Ganqu Cui
Wanxiang Che
Zhiyuan Liu
Maosong Sun
26
0
0
04 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
44
9
0
03 Apr 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu-Hu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Yu Wu
Chenlin Ming
H. V. Zhao
Conghui He
Lijun Wu
LRM
59
0
0
21 Mar 2025
Tuning LLMs by RAG Principles: Towards LLM-native Memory
Tuning LLMs by RAG Principles: Towards LLM-native Memory
Jiale Wei
Shuchi Wu
Ruochen Liu
Xiang Ying
Jingbo Shang
Fangbo Tao
RALM
60
0
0
20 Mar 2025
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?
Jacqueline L. Mitchell
Brian Hyeongseok Kim
Chenyu Zhou
Chao Wang
LRM
53
0
0
16 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
41
1
0
14 Mar 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRM
VLM
51
4
0
10 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
49
0
0
04 Mar 2025
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Mian Zhang
S. Eack
Zhiyu Zoey Chen
67
1
0
27 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance
Expect the Unexpected: FailSafe Long Context QA for Finance
Kiran Kamble
M. Russak
Dmytro Mozolevskyi
Muayad Ali
Mateusz Russak
Waseem Alshikh
67
0
0
10 Feb 2025
Improving Video Generation with Human Feedback
Improving Video Generation with Human Feedback
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
...
Pengfei Wan
Di Zhang
Kun Gai
Yujiu Yang
Wanli Ouyang
VGen
EGVM
48
12
0
23 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
D. Lin
Jiaqi Wang
VLM
68
17
0
21 Jan 2025
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
Israel Puerta-Merino
Carlos Núnez-Molina
Pablo Mesejo
Juan Fernández-Olivares
52
2
0
14 Jan 2025
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James Validad Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
45
8
0
08 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
50
0
0
31 Dec 2024
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
  Generation for Preference Alignment
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang-Jun Liu
Jun Zhao
ALM
82
7
0
18 Dec 2024
Structured Extraction of Real World Medical Knowledge using LLMs for
  Summarization and Search
Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search
Edward Kim
Manil Shrestha
Richard Foty
Tom DeLay
Vicki Seyfert-Margolis
64
1
0
16 Dec 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
70
1
0
28 Nov 2024
VLRewardBench: A Challenging Benchmark for Vision-Language Generative
  Reward Models
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Lingpeng Kong
Q. Liu
CoGe
VLM
107
24
0
26 Nov 2024
Self-Generated Critiques Boost Reward Modeling for Language Models
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
78
14
0
25 Nov 2024
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Chen Yang
Chenyang Zhao
Q. Gu
Dongruo Zhou
LRM
33
0
0
22 Oct 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu
Liwei Jiang
Yejin Choi
29
2
0
03 Oct 2024
Direct Judgement Preference Optimization
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
37
11
0
23 Sep 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
40
0
0
04 Jul 2024
1