Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation

4 December 2023

Papers citing "Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation"

11 / 11 papers shown

Title
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries Neil He Jiahong Liu Buze Zhang N. Bui Ali Maatouk Menglin Yang Irwin King Melanie Weber Rex Ying 27 0 0 11 Apr 2025
Geometric Signatures of Compositionality Across a Language Model's Lifetime Jin Hwa Lee Thomas Jiralerspong Lei Yu Yoshua Bengio Emily Cheng CoGe 82 0 0 02 Oct 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 486 0 01 Nov 2022
Toxicity Detection with Generative Prompt-based Inference Yau-Shian Wang Y. Chang 77 34 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,730 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining Ahmed Imtiaz Humayun Randall Balestriero Richard Baraniuk OOD 35 29 0 15 Oct 2021
The Intrinsic Dimension of Images and Its Impact on Learning Phillip E. Pope Chen Zhu Ahmed Abdelkader Micah Goldblum Tom Goldstein 189 256 0 18 Apr 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 221 402 0 24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 245 1,977 0 31 Dec 2020
The Lottery Ticket Hypothesis for Pre-trained BERT Networks Tianlong Chen Jonathan Frankle Shiyu Chang Sijia Liu Yang Zhang Zhangyang Wang Michael Carbin 148 345 0 23 Jul 2020