ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based
EvaluationNeural Information Processing Systems (NeurIPS), 2024 |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024 |
Red-Teaming for Generative AI: Silver Bullet or Security Theater?AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024 |