Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.07589
Cited By
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
11 October 2023
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models"
24 / 24 papers shown
Title
Risk-Aware Distributional Intervention Policies for Language Models
Bao Nguyen
Binh Nguyen
Duy Nguyen
V. Nguyen
28
1
0
28 Jan 2025
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
24
0
0
24 Oct 2024
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr F. Locatelli
Sara Hooker
A. Ustun
MoE
45
1
0
28 Aug 2024
The Future of Open Human Feedback
Shachar Don-Yehiya
Ben Burtenshaw
Ramon Fernandez Astudillo
Cailean Osborne
Mimansa Jaiswal
...
Omri Abend
Jennifer Ding
Sara Hooker
Hannah Rose Kirk
Leshem Choshen
VLM
ALM
56
4
0
15 Aug 2024
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
Shachar Don-Yehiya
Leshem Choshen
Omri Abend
19
2
0
15 Aug 2024
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
42
14
0
08 Jul 2024
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Xiaochen Li
Zheng-Xin Yong
Stephen H. Bach
CLL
23
11
0
23 Jun 2024
LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
Tianci Liu
Haoyu Wang
Shiyang Wang
Yu Cheng
Jing Gao
ALM
30
0
0
01 Jun 2024
Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding
Zheng Zhao
Emilio Monti
Jens Lehmann
H. Assem
42
21
0
04 May 2024
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
21
2
0
16 Apr 2024
Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability
Jinwei Lu
Yuanfeng Song
Haodi Zhang
Chen Zhang
Raymond Chi-Wing Wong
OOD
24
0
0
10 Apr 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
B. Ermiş
36
7
0
06 Mar 2024
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models
Xin Yi
Linlin Wang
Xiaoling Wang
Liang He
MoMe
32
1
0
23 Feb 2024
Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs
Xun Liang
Hanyu Wang
Shichao Song
Mengting Hu
Xunzhi Wang
Zhiyu Li
Feiyu Xiong
Bo Tang
10
9
0
17 Feb 2024
Representation Surgery: Theory and Practice of Affine Steering
Shashwat Singh
Shauli Ravfogel
Jonathan Herzig
Roee Aharoni
Ryan Cotterell
Ponnurangam Kumaraguru
LLMSV
27
12
0
15 Feb 2024
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
A. Ustun
Viraat Aryabumi
Zheng-Xin Yong
Wei-Yin Ko
Daniel D'souza
...
Shayne Longpre
Niklas Muennighoff
Marzieh Fadaee
Julia Kreutzer
Sara Hooker
ALM
ELM
SyDa
LRM
27
192
0
12 Feb 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
16
119
0
02 Aug 2023
Semiparametric Language Models Are Scalable Continual Learners
Guangyue Peng
Tao Ge
Si-Qing Chen
Furu Wei
Houfeng Wang
KELM
37
10
0
02 Mar 2023
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
226
126
0
25 May 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
70
62
0
28 Jan 2022
Towards Continual Knowledge Learning of Language Models
Joel Jang
Seonghyeon Ye
Sohee Yang
Joongbo Shin
Janghoon Han
Gyeonghun Kim
Stanley Jungkyu Choi
Minjoon Seo
CLL
KELM
222
150
0
07 Oct 2021
Text Detoxification using Large Pre-trained Neural Models
David Dale
Anton Voronov
Daryna Dementieva
V. Logacheva
Olga Kozlova
Nikita Semenov
Alexander Panchenko
39
71
0
18 Sep 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
204
607
0
03 Sep 2019
1