Rigorously Assessing Natural Language Explanations of NeuronsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Jing-ling Huang

Atticus Geiger

Karel DÓosterlinck

Zhengxuan Wu

Christopher Potts

MILM

241

19 Sep 2023

Cross-Lingual Knowledge Editing in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

226

16 Sep 2023

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

298

14 Sep 2023

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMsInternational Conference on Learning Representations (ICLR), 2023

577

105

13 Sep 2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

306

12 Sep 2023

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language ModelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

236

11 Sep 2023

Neurons in Large Language Models: Dead, N-gram, PositionalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Elena Voita

Javier Ferrando

Christoforos Nalmpantis

MILM

400

09 Sep 2023

FIND: A Function Description Benchmark for Evaluating Interpretability MethodsNeural Information Processing Systems (NeurIPS), 2023

Shuang Li

262

07 Sep 2023

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

287

288

07 Sep 2023

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language ModelsComputational Linguistics (CL), 2023

...

733

828

03 Sep 2023

Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

Haiyan Zhao

Hanjie Chen

Fan Yang

Ninghao Liu

500

710

02 Sep 2023

Emergent Linear Representations in World Models of Self-Supervised Sequence ModelsBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

316

249

02 Sep 2023

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Finale Doshi-Velez

215

01 Sep 2023

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

275

27 Aug 2023

AI-Generated Content (AIGC) for Various Data Modalities: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

Lin Geng Foo

Hossein Rahmani

Jing Liu

770

27 Aug 2023

Unified Concept Editing in Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

380

303

25 Aug 2023

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge NeuronsAAAI Conference on Artificial Intelligence (AAAI), 2023

Yuheng Chen

Pengfei Cao

Yubo Chen

Kang Liu

Jun Zhao

KELM

334

25 Aug 2023

Overcoming Generic Knowledge Loss with Selective Parameter UpdateComputer Vision and Pattern Recognition (CVPR), 2023

380

23 Aug 2023

Mode Combinability: Exploring Convex Combinations of Permutation Aligned ModelsNeural Networks (Neural Netw.), 2023

183

22 Aug 2023

GradientCoin: A Peer-to-Peer Decentralized Large Language Models

Yeqi Gao

Zhao Song

Junze Yin

179

21 Aug 2023

DocTER: Evaluating Document-based Knowledge EditingInformation Processing & Management (IPM), 2023

302

19 Aug 2023

Linearity of Relation Decoding in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023

335

140

17 Aug 2023

PMET: Precise Model Editing in a TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023

Shasha Li

537

183

17 Aug 2023

Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module OperationAAAI Conference on Artificial Intelligence (AAAI), 2023

Baotian Hu

211

16 Aug 2023

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

Peng Wang

Ningyu Zhang

...

Huajun Chen

258

14 Aug 2023

Explaining Relation Classification Models with Semantic Extents

123

04 Aug 2023

Multimodal Neurons in Pretrained Text-Only Transformers

Antonio Torralba

272

03 Aug 2023

Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI

Avijit Ghosh

D. Lakshmi

02 Aug 2023

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training ModelACM Multimedia (ACM MM), 2023

178

02 Aug 2023