ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based
EvaluationNeural Information Processing Systems (NeurIPS), 2024 |
Cross-Care: Assessing the Healthcare Implications of Pre-training Data
on Language Model BiasNeural Information Processing Systems (NeurIPS), 2024 Shan Chen Jack Gallifant Mingye Gao Pedro Moreira Nikolaj Munch ...Hugo J. W. L. Aerts Brian Anthony Leo Anthony Celi William G. La Cava Danielle S. Bitterman |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024 |
SLANG: New Concept Comprehension of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language
ModelsInternational Conference on Learning Representations (ICLR), 2023 |