UniBias: Unveiling and Mitigating LLM Bias through Internal Attention
and FFN Manipulation

UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

31 May 2024

Papers citing "UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation"

11 / 11 papers shown

Title
Exploiting Contextual Knowledge in LLMs through V-usable Information based Layer Enhancement Xiaowei Yuan Zhao Yang Ziyang Huang Y. Wang Siqi Fan Yiming Ju Jun Zhao Kang-Jun Liu 27 0 0 22 Apr 2025
Accelerating Particle-based Energetic Variational Inference Xuelian Bao Lulu Kang Chun Liu Yiwei Wang BDL 59 0 0 04 Apr 2025
Grounded Chain-of-Thought for Multimodal Large Language Models Qiong Wu Xiangcong Yang Yiyi Zhou Chenxin Fang Baiyang Song Xiaoshuai Sun Rongrong Ji LRM 76 1 0 17 Mar 2025
Shortcut Learning in In-Context Learning: A Survey Rui Song Yingji Li Fausto Giunchiglia Fausto Giunchiglia Hao Xu 38 1 0 04 Nov 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability ZhongXiang Sun Xiaoxue Zang Kai Zheng Yang Song Jun Xu Xiao Zhang Weijie Yu Yang Song Han Li 55 7 0 15 Oct 2024
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models Zheng Yi Ho Siyuan Liang Sen Zhang Yibing Zhan Dacheng Tao 26 2 0 11 Oct 2024
Characterizing Mechanisms for Factual Recall in Language Models Qinan Yu Jack Merullo Ellie Pavlick KELM 42 23 0 24 Oct 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 186 116 0 30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 210 491 0 01 Nov 2022
Prototypical Calibration for Few-shot Learning of Language Models Zhixiong Han Y. Hao Li Dong Yutao Sun Furu Wei 168 52 0 20 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 277 1,114 0 18 Apr 2021