The Linear Representation Hypothesis and the Geometry of Large Language
ModelsInternational Conference on Machine Learning (ICML), 2023 |
The Effect of Scaling, Retrieval Augmentation and Form on the Factual
Consistency of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Defining a New NLP PlaygroundConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
A Survey on Knowledge Editing of Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023 |
Debiasing Algorithm through Model AdaptationInternational Conference on Learning Representations (ICLR), 2023 |
Codebook Features: Sparse and Discrete Interpretability for Neural
NetworksInternational Conference on Machine Learning (ICML), 2023 |
How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023 |
Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained
Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Knowledge Editing for Large Language Models: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023 |
In-Context Learning Creates Task VectorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Characterizing Mechanisms for Factual Recall in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Unnatural language processing: How do language models handle
machine-generated prompts?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Unveiling Multilinguality in Transformer Models: Exploring Language
Specificity in Feed-Forward NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023 |
KITAB: Evaluating LLMs on Constraint Satisfaction for Information
RetrievalInternational Conference on Learning Representations (ICLR), 2023 |
Function Vectors in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023 |
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency
in Both Image Classification and GenerationInternational Conference on Learning Representations (ICLR), 2023 |
Emptying the Ocean with a Spoon: Should We Edit Models?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from
a Parametric PerspectiveInternational Conference on Learning Representations (ICLR), 2023 |
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with RepresentationsInternational Conference on Learning Representations (ICLR), 2023 |
Attribution Patching Outperforms Automated Circuit DiscoveryBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023 |
How Do Large Language Models Capture the Ever-changing World Knowledge?
A Review of Recent AdvancesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
An Adversarial Example for Direct Logit Attribution: Memory Management
in gelu-4lBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023 |
A Meta-Learning Perspective on Transformers for Causal Language ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
SPADE: Sparsity-Guided Debugging for Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2023 |
Discovering Knowledge-Critical Subnetworks in Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |