The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

4 December 2023

Bill Yuchen Lin

Abhilasha Ravichander

Yejin Choi

ArXiv (abs)PDF HTML HuggingFace (33 upvotes)

Papers citing "The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning"

50 / 112 papers shown

Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines

476

26 Nov 2025

Factors That Support Grounded Responses in LLM Conversations: A Rapid Review

Gabriele Cesar Iwashima

Claudia Susie Rodrigues

Claudio Dipolitto

Geraldo Xexéo

24 Nov 2025

Rethinking Deep Alignment Through The Lens Of Incomplete Learning

107

15 Nov 2025

Data Trajectory Alignment for LLM Domain Adaptation: A Two-Phase Synthesis Framework for Telecommunications Mathematics

127

10 Nov 2025

Inference-Time Personalized Alignment with a Few User Preference Queries

Victor-Alexandru Pădurean

Parameswaran Kamalaruban

Nachiket Kotalwar

Alkis Gotovos

Adish Singla

171

04 Nov 2025

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

101

14 Oct 2025

ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning

138

11 Oct 2025

IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular DataAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

102

10 Oct 2025

Understanding the Effects of Domain Finetuning on LLMs

135

10 Oct 2025

Reasoning for Hierarchical Text Classification: The Case of Patents

152

08 Oct 2025

Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance

Ahmed Alajrami

Xingwei Tan

Nikolaos Aletras

178

03 Oct 2025

Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling

156

01 Oct 2025

On Theoretical Interpretations of Concept-Based In-Context Learning

Huaze Tang

Tianren Peng

Shao-Lun Huang

204

25 Sep 2025

Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes

183

25 Sep 2025

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models

144

18 Sep 2025

RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation

153

29 Aug 2025

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

...

228

26 Aug 2025

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

338

25 Aug 2025

Speculative Safety-Aware Decoding

Xuekang Wang

Shengyu Zhu

Xueqi Cheng

191

25 Aug 2025

NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

145

13 Aug 2025

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

174

12 Aug 2025

A Survey on Training-free Alignment of Large Language Models

446

12 Aug 2025

P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis

173

06 Aug 2025

The Homogenizing Effect of Large Language Models on Human Expression and Thought

Zhivar Sourati

Alireza S. Ziabari

Morteza Dehghani

172

02 Aug 2025

Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations

279

02 Jul 2025

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

203

01 Jul 2025

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

Chenghao Yang

Ari Holtzman

221

22 Jun 2025

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

317

18 Jun 2025

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

316

11 Jun 2025

SoK: Machine Unlearning for Large Language Models

182

10 Jun 2025

Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong DecodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

216

09 Jun 2025

United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory

242

07 Jun 2025

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

246

04 Jun 2025

T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning

347

02 Jun 2025

RAST: Reasoning Activation in LLMs via Small-model Transfer

256

30 May 2025

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

144

30 May 2025

Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge EditingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

272

28 May 2025

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

373

26 May 2025

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models

297

24 May 2025

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

411

23 May 2025

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

422

23 May 2025

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

314

12 May 2025

LLAMAPIE: Proactive In-Ear Conversation AssistantsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

927

07 May 2025

Base Models Beat Aligned Models at Randomness and Creativity

Peter West

Christopher Potts

1.1K

30 Apr 2025

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Takuma Udagawa

Yang Zhao

H. Kanayama

Bishwaranjan Bhattacharjee

414

19 Apr 2025

CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

511

15 Apr 2025

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

...

Bodhisattwa Prasad Majumder

Jingbo Shang

Prithviraj Ammanabrolu

Julian McAuley

406

09 Apr 2025

Representation Bending for Large Language Model SafetyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

442

02 Apr 2025

LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution

Zhuoran Yang

Jie Peng

AAML

316

02 Apr 2025

Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive PlausibilityAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

357

21 Mar 2025