Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing InducementsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 |
Interpreting Arithmetic Mechanism in Large Language Models through
Comparative Neuron AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment TasksNeural Information Processing Systems (NeurIPS), 2023 |
Evaluating the Moral Beliefs Encoded in LLMsNeural Information Processing Systems (NeurIPS), 2023 |
Enhancing Chat Language Models by Scaling High-quality Instructional
ConversationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Toxicity in ChatGPT: Analyzing Persona-assigned Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Generative Agents: Interactive Simulacra of Human BehaviorACM Symposium on User Interface Software and Technology (UIST), 2023 |
G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Mass-Editing Memory in a TransformerInternational Conference on Learning Representations (ICLR), 2022 |
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022 |
Locating and Editing Factual Associations in GPTNeural Information Processing Systems (NeurIPS), 2022 |
Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017 |