Title
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs Haoming Yang Ke Ma X. Jia Yingfei Sun Qianqian Xu Q. Huang AAML 45 0 0 03 May 2025
Learn Your Reference Model for Real Good Alignment Alexey Gorbatovski Boris Shaposhnikov Alexey Malakhov Nikita Surnachev Yaroslav Aksenov Ian Maksimov Nikita Balagansky Daniil Gavrilov OffRL 45 25 0 15 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts Jiahao Yu Xingwei Lin Zheng Yu Xinyu Xing SILM 110 292 0 19 Sep 2023