ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17712
  4. Cited By

Understanding How Value Neurons Shape the Generation of Specified Values in LLMs

23 May 2025
Yi Su
Jiayi Zhang
Shu Yang
Xinhai Wang
Lijie Hu
Di Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Understanding How Value Neurons Shape the Generation of Specified Values in LLMs"

30 / 30 papers shown
Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
Shangrui Nie
Florian Mai
David Kaczér
Charles F Welch
Zhixue Zhao
Lucie Flek
OffRL
80
0
0
15 Aug 2025
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
Xinyan Jiang
L. Zhang
Jiayi Zhang
Qingsong Yang
Guimin Hu
Di Wang
Lijie Hu
LLMSV
401
6
0
14 Aug 2025
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models
Keyu Wang
Jin Li
Shu Yang
Zhuoran Zhang
Haiyan Zhao
470
6
0
04 Aug 2025
The Compositional Architecture of Regret in Large Language Models
The Compositional Architecture of Regret in Large Language Models
Xiangxiang Cui
Shu Yang
Tianjin Huang
Wanyu Lin
Lijie Hu
Haiyan Zhao
244
0
0
18 Jun 2025
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
Wenrui Zhou
Shu Yang
Shu Yang
Qingsong Yang
Zikun Guo
Di Wang
Lijie Hu
Haiyan Zhao
199
6
0
08 Jun 2025
Towards User-level Private Reinforcement Learning with Human Feedback
Towards User-level Private Reinforcement Learning with Human Feedback
Jing Zhang
Mingxi Lei
Meng Ding
Mengdi Li
Zihang Xiang
Difei Xu
Jinhui Xu
Di Wang
267
6
0
22 Feb 2025
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing InducementsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Shu Yang
Shenzhe Zhu
Zeyu Wu
Keyu Wang
Junchi Yao
Junchao Wu
Lijie Hu
Mengdi Li
Yang Li
Di Wang
433
24
0
18 Feb 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Guang Dai
Lijie Hu
Di Wang
LRM
391
9
0
17 Feb 2025
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Li Zhang
Wenshuo Dong
Zhuoran Zhang
Shu Yang
Lijie Hu
Ninghao Liu
Pan Zhou
Di Wang
233
9
0
07 Feb 2025
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Zeping Yu
Sophia Ananiadou
KELM
300
12
0
24 Jan 2025
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
Huandong Wang
Wenjie Fu
Yingzhou Tang
Zhilong Chen
Yanhua Huang
J. Piao
Chen Gao
Fengli Xu
Tao Jiang
Yongqian Li
PILM
224
22
0
17 Jan 2025
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang
Yongqian Li
Zijian Kan
Keyuan Cheng
Lijie Hu
Di Wang
KELM
432
26
0
08 Oct 2024
Interpreting Arithmetic Mechanism in Large Language Models through
  Comparative Neuron Analysis
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zeping Yu
Sophia Ananiadou
LRMMILM
319
26
0
21 Sep 2024
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Yongqi Leng
Deyi Xiong
398
17
0
09 Jul 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large
  Language Models
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
364
95
0
26 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common
  Knowledge
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
561
32
0
21 Feb 2024
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
Shu Yang
Muhammad Asif Ali
Cheng-Long Wang
Lijie Hu
Haiyan Zhao
CLLMoE
278
64
0
17 Feb 2024
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
  Judgment Tasks
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment TasksNeural Information Processing Systems (NeurIPS), 2023
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
285
55
0
30 Oct 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMsNeural Information Processing Systems (NeurIPS), 2023
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
262
208
0
26 Jul 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional ConversationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
365
765
0
23 May 2023
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Toxicity in ChatGPT: Analyzing Persona-assigned Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ameet Deshpande
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
LM&MALLMAG
240
459
0
11 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human BehaviorACM Symposium on User Interface Software and Technology (UIST), 2023
Cristina Mata
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Abigail Z. Jacobs
Michael S. Bernstein
LM&RoAI4CE
887
3,153
0
07 Apr 2023
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies:
  An Empirical Study
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study
Yong Cao
Li Zhou
Seolhwa Lee
Laura Cabello
Min Chen
Daniel Hershcovich
266
265
0
30 Mar 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELMALMLM&MA
608
1,873
0
29 Mar 2023
Mass-Editing Memory in a Transformer
Mass-Editing Memory in a TransformerInternational Conference on Learning Representations (ICLR), 2022
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELMVLM
437
809
0
13 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.1K
18,067
0
04 Mar 2022
Locating and Editing Factual Associations in GPT
Locating and Editing Factual Associations in GPTNeural Information Processing Systems (NeurIPS), 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
1.0K
1,999
0
10 Feb 2022
Alignment of Language Agents
Alignment of Language Agents
Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
G. Irving
247
206
0
26 Mar 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
650
1,177
0
29 Dec 2020
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.4K
163,656
0
12 Jun 2017
1
Page 1 of 1