ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00761
  4. Cited By
Tamper-Resistant Safeguards for Open-Weight LLMs
v1v2v3v4 (latest)

Tamper-Resistant Safeguards for Open-Weight LLMs

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
Tarun Suresh
Maxwell Lin
Justin Wang
Rowan Wang
Ron Arel
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
    AAMLMU
ArXiv (abs)PDFHTML

Papers citing "Tamper-Resistant Safeguards for Open-Weight LLMs"

15 / 115 papers shown
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
906
2,535
0
31 Dec 2020
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Kris McGuffie
Alex Newhouse
173
162
0
15 Sep 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
2.2K
6,489
0
07 Sep 2020
Forgetting Outside the Box: Scrubbing Deep Networks of Information
  Accessible from Input-Output Observations
Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output ObservationsEuropean Conference on Computer Vision (ECCV), 2020
Aditya Golatkar
Alessandro Achille
Stefano Soatto
MUOOD
388
225
0
05 Mar 2020
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep
  Networks
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep NetworksComputer Vision and Pattern Recognition (CVPR), 2019
Aditya Golatkar
Alessandro Achille
Stefano Soatto
CLLMU
522
667
0
12 Nov 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter ModelsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2019
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALMAI4CE
433
1,396
0
04 Oct 2019
The Woman Worked as a Babysitter: On Biases in Language Generation
The Woman Worked as a Babysitter: On Biases in Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
692
744
0
03 Sep 2019
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
Alex Nichol
Joshua Achiam
John Schulman
797
2,449
0
08 Mar 2018
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferencesNeural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.6K
4,387
0
12 Jun 2017
Snapshot Ensembles: Train 1, get M for free
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Shouqing Yang
Geoff Pleiss
Zhuang Liu
John E. Hopcroft
Kilian Q. Weinberger
OODFedMLUQCV
554
1,031
0
01 Apr 2017
Understanding Black-box Predictions via Influence Functions
Understanding Black-box Predictions via Influence Functions
Pang Wei Koh
Abigail Z. Jacobs
TDI
506
3,287
0
14 Mar 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
1.6K
13,470
0
09 Mar 2017
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
1.0K
3,490
0
26 Sep 2016
Estimating or Propagating Gradients Through Stochastic Neurons for
  Conditional Computation
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio
Nicholas Léonard
Aaron Courville
890
3,536
0
15 Aug 2013
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
420
6,803
0
22 Dec 2012
Previous
123