ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19327
  4. Cited By
Paying Alignment Tax with Contrastive Learning

Paying Alignment Tax with Contrastive Learning

25 May 2025
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
ArXiv (abs)PDFHTML

Papers citing "Paying Alignment Tax with Contrastive Learning"

19 / 19 papers shown
Title
A Contrastive Learning Approach to Mitigate Bias in Speech Models
A Contrastive Learning Approach to Mitigate Bias in Speech Models
Alkis Koudounas
Flavio Giobergia
Eliana Pastor
Elena Baralis
77
6
0
20 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language
  Understanding Benchmark
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRMELM
176
465
0
03 Jun 2024
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMeCLL
112
81
0
12 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
550
12,137
0
18 Jul 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
  LLMs by Validating Low-Confidence Generation
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
125
175
0
08 Jul 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A
  Two-Stage Approach to Mitigate Social Biases
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li
Mengnan Du
Xin Wang
Ying Wang
98
30
0
04 Jul 2023
Debiasing should be Good and Bad: Measuring the Consistency of Debiasing
  Techniques in Language Models
Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models
Robert D Morabito
Jad Kabbara
Ali Emami
50
7
0
23 May 2023
Democratizing Neural Machine Translation with OPUS-MT
Democratizing Neural Machine Translation with OPUS-MT
Jörg Tiedemann
Mikko Aulamo
Daria Bakshandaeva
M. Boggia
Stig-Arne Gronroos
Tommi Nieminen
Alessandro Raganato
Yves Scherrer
Raúl Vázquez
Sami Virpioja
126
33
0
04 Dec 2022
CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in
  Abstractive Summarization
CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization
Shuyang Cao
Lu Wang
HILM
79
182
0
19 Sep 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
160
1,956
0
08 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
324
389
0
28 Feb 2021
Measuring and Reducing Gendered Correlations in Pre-trained Models
Measuring and Reducing Gendered Correlations in Pre-trained Models
Kellie Webster
Xuezhi Wang
Ian Tenney
Alex Beutel
Emily Pitler
Ellie Pavlick
Jilin Chen
Ed Chi
Slav Petrov
FaML
119
260
0
12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
291
1,226
0
24 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
470
4,587
0
07 Sep 2020
Towards Debiasing Sentence Representations
Towards Debiasing Sentence Representations
Paul Pu Liang
Irene Li
Emily Zheng
Y. Lim
Ruslan Salakhutdinov
Louis-Philippe Morency
106
242
0
16 Jul 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace
  Projection
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
158
388
0
16 Apr 2020
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
Emily Dinan
Angela Fan
Adina Williams
Jack Urbanek
Douwe Kiela
Jason Weston
141
209
0
10 Nov 2019
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in
  Languages with Rich Morphology
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
Ran Zmigrod
Sabrina J. Mielke
Hanna M. Wallach
Ryan Cotterell
121
283
0
11 Jun 2019
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
193
5,074
0
27 Jun 2016
1