All Papers
0 / 0 papers shown
Title |
|---|
Title |
|---|

Title |
|---|
![]() COSMIC: Generalized Refusal Direction Identification in LLM ActivationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
![]() Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024 |
![]() Geometric Signatures of Compositionality Across a Language Model's LifetimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
![]() AutoML-guided Fusion of Entity and LLM-based representationsIFIP Working Conference on Database Semantics (IWDS), 2024 |
![]() Characterizing Large Language Model Geometry Helps Solve Toxicity
Detection and GenerationInternational Conference on Machine Learning (ICML), 2023 |
![]() Cognitive Dissonance: Why Do Language Model Outputs Disagree with
Internal Representations of Truthfulness?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
![]() Outlier Dimensions Encode Task-Specific KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
![]() LEACE: Perfect linear concept erasure in closed formNeural Information Processing Systems (NeurIPS), 2023 |
![]() BrainBERT: Self-supervised representation learning for intracranial
recordingsInternational Conference on Learning Representations (ICLR), 2023 |
![]() Syntax-guided Neural Module Distillation to Probe Compositionality in
Sentence EmbeddingsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 |
![]() Emergent World Representations: Exploring a Sequence Model Trained on a
Synthetic TaskInternational Conference on Learning Representations (ICLR), 2022 |
![]() Reprint: a randomized extrapolation based on principal components for
data augmentationSocial Science Research Network (SSRN), 2022 |
![]() Kernelized Concept ErasureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
![]() Linear Adversarial Concept ErasureInternational Conference on Machine Learning (ICML), 2022 |
![]() Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces
with PseudowordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |