ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.11415
  4. Cited By
Circumventing interpretability: How to defeat mind-readers

Circumventing interpretability: How to defeat mind-readers

21 December 2022
Lee D. Sharkey
ArXivPDFHTML

Papers citing "Circumventing interpretability: How to defeat mind-readers"

5 / 5 papers shown
Title
An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models
Jaturong Kongmanee
34
1
0
28 Jan 2025
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
111
0
22 Apr 2024
Don't trust your eyes: on the (un)reliability of feature visualizations
Don't trust your eyes: on the (un)reliability of feature visualizations
Robert Geirhos
Roland S. Zimmermann
Blair Bilodeau
Wieland Brendel
Been Kim
FAtt
OOD
27
25
0
07 Jun 2023
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
120
316
0
21 Sep 2022
Building machines that adapt and compute like brains
Building machines that adapt and compute like brains
Brenden Lake
J. Tenenbaum
AI4CE
FedML
NAI
AILaw
245
890
0
11 Nov 2017
1