Title
Scopes of Alignment Kush R. Varshney Zahra Ashktorab Djallel Bouneffouf Matthew D Riemer Justin D. Weisz 34 0 0 15 Jan 2025
A Roadmap to Pluralistic Alignment Taylor Sorensen Jared Moore Jillian R. Fisher Mitchell L. Gordon Niloofar Mireshghallah ... Liwei Jiang Ximing Lu Nouha Dziri Tim Althoff Yejin Choi 65 75 0 07 Feb 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 218 441 0 23 Aug 2022