ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.09270
  4. Cited By
Towards Safer Generative Language Models: A Survey on Safety Risks,
  Evaluations, and Improvements

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

18 February 2023
Jiawen Deng
Jiale Cheng
Hao-Lun Sun
Zhexin Zhang
Minlie Huang
    LM&MA
    ELM
ArXivPDFHTML

Papers citing "Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements"

14 / 14 papers shown
Title
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM
Madhur Jindal
Saurabh Deshpande
AAML
43
0
0
07 May 2025
COBIAS: Contextual Reliability in Bias Assessment
COBIAS: Contextual Reliability in Bias Assessment
Priyanshul Govil
Hemang Jain
Vamshi Bonagiri
Aman Chadha
Ponnurangam Kumaraguru
Manas Gaur
S. Dey
29
2
0
22 Feb 2024
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
67
10,890
0
18 Jul 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
53
34
0
01 Jan 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Diffusion-LM Improves Controllable Text Generation
Diffusion-LM Improves Controllable Text Generation
Xiang Lisa Li
John Thickstun
Ishaan Gulrajani
Percy Liang
Tatsunori B. Hashimoto
AI4CE
163
768
0
27 May 2022
You Don't Know My Favorite Color: Preventing Dialogue Representations
  from Revealing Speakers' Private Personas
You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas
Haoran Li
Yangqiu Song
Lixin Fan
59
17
0
26 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
210
364
0
15 Oct 2021
Text Detoxification using Large Pre-trained Neural Models
Text Detoxification using Large Pre-trained Neural Models
David Dale
Anton Voronov
Daryna Dementieva
V. Logacheva
Olga Kozlova
Nikita Semenov
Alexander Panchenko
39
71
0
18 Sep 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
191
0
15 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
254
374
0
28 Feb 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
264
1,798
0
14 Dec 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
204
607
0
03 Sep 2019
1