Title
What Is AI Safety? What Do We Want It to Be? Jacqueline Harding Cameron Domenico Kirk-Giannini 64 0 0 05 May 2025
The Precautionary Principle and the Innovation Principle: Incompatible Guides for AI Innovation Governance? Kim Kaivanto 22 0 0 01 May 2025
Real-World Gaps in AI Governance Research Ilan Strauss Isobel Moure Tim O'Reilly Sruly Rosenblat 61 0 0 30 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Zhun Wang Andy Zhang Dawn Song 52 1 0 07 Apr 2025
What Makes an Evaluation Useful? Common Pitfalls and Best Practices Gil Gekker Meirav Segal Dan Lahav Omer Nevo ELM 43 0 0 30 Mar 2025
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration Andy Zhou Kevin E. Wu Francesco Pinto Z. Chen Yi Zeng Yu Yang Shuang Yang Sanmi Koyejo James Zou Bo Li LLMAG AAML 75 0 0 20 Mar 2025
The AI Pentad, the CHARME $^{2}$ D Model, and an Assessment of Current-State AI Regulation Di Kevin Gao Sudip Mittal Jiming Wu Hongwei Du Jingdao Chen Shahram Rahimi 33 0 0 08 Mar 2025
SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks Yue Gao Ilia Shumailov Kassem Fawaz AAML 126 0 0 21 Feb 2025
Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions Doaa Mahmud Hadeel Hajmohamed Shamma Almentheri Shamma Alqaydi Lameya Aldhaheri R. A. Khalil Nasir Saeed AI4TS 38 5 0 08 Jan 2025
Towards Data Governance of Frontier AI Models Jason Hausenloy Duncan McClements Madhavendra Thakur 67 1 0 05 Dec 2024
Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation Peter Barnett Lisa Thiergart ELM 67 2 0 19 Nov 2024
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI Jonghong Jeon 29 2 0 29 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models Eddie L. Ungless Nikolas Vitsakis Zeerak Talat James Garforth Bjorn Ross Arno Onken Atoosa Kasirzadeh Alexandra Birch 28 1 0 17 Oct 2024
SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders Constantin Venhoff Anisoara Calinescu Philip H. S. Torr Christian Schroeder de Witt 28 0 0 09 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies Ritwik Gupta Leah Walker Rodolfo Corona Stephanie Fu Suzanne Petryk Janet Napolitano Trevor Darrell Andrew W. Reddie ELM 35 3 0 25 Sep 2024
Democratising Artificial Intelligence for Pandemic Preparedness and Global Governance in Latin American and Caribbean Countries Andre de Carvalho R. Bonidia Jude Dzevela Kong Mariana Dauhajre C. Struchiner ... Edian F. Franco Cesar Ugarte-Gil Patricia Espinoza-Lopez Gabriel Carrasco-Escobar Ulisses Rocha 38 0 0 21 Sep 2024
Can Editing LLMs Inject Harm? Canyu Chen Baixiang Huang Zekun Li Zhaorun Chen Shiyang Lai ... Xifeng Yan William Wang Philip H. S. Torr Dawn Song Kai Shu KELM 38 11 0 29 Jul 2024
Evaluating AI Evaluation: Perils and Prospects John Burden ELM 33 8 0 12 Jul 2024
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations Adam Dahlgren Lindstrom Leila Methnani Lea Krause Petter Ericson Ínigo Martínez de Rituerto de Troya Dimitri Coelho Mollo Roel Dobbe ALM 31 2 0 26 Jun 2024
Evolutionary Computation for the Design and Enrichment of General-Purpose Artificial Intelligence Systems: Survey and Prospects Javier Poyatos Javier Del Ser Salvador Garcia H. Ishibuchi Daniel Molina I. Triguero Bing Xue Xin Yao Francisco Herrera 35 1 0 03 Jun 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models Meftahul Ferdaus Mahdi Abdelguerfi Elias Ioup Kendall N. Niles Ken Pathak Steve Sloan 32 10 0 01 Jun 2024
Societal Adaptation to Advanced AI Jamie Bernardi Gabriel Mukobi Hilary Greaves Lennart Heim Markus Anderljung 40 4 0 16 May 2024
Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act Sinan Arda 22 3 0 17 Apr 2024
The Necessity of AI Audit Standards Boards David Manheim Sammy Martin Mark Bailey Mikhail Samin Ross Greutzmacher 26 7 0 11 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models Ruibo Liu Jerry W. Wei Fangyu Liu Chenglei Si Yanzhe Zhang ... Steven Zheng Daiyi Peng Diyi Yang Denny Zhou Andrew M. Dai SyDa EgoV 41 85 0 11 Apr 2024
Responsible Reporting for Frontier AI Development Noam Kolt Markus Anderljung Joslyn Barnhart Asher Brass K. Esvelt Gillian K. Hadfield Lennart Heim Mikel Rodriguez Jonas B. Sandbrink Thomas Woodside 34 13 0 03 Apr 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation D. Grabb Max Lamparth N. Vasan 38 14 0 02 Apr 2024
Are large language models superhuman chemists? Adrian Mirza Nawaf Alampara Sreekanth Kunchapu Benedict Emoekabu Aswanth Krishnan ... Leanne M. Stafast Dinga Wonanke Michael Pieler P. Schwaller K. Jablonka ELM AI4MH LRM LM&MA 26 4 0 01 Apr 2024
Trust AI Regulation? Discerning users are vital to build trust and effective AI regulation Zainab Alalawi Paolo Bova Theodor Cimpeanu A. D. Stefano M. H. Duong ... Han The Anh Marcus Krellner Bianca Ogbo Simon T. Powers Filippo Zimmaro 43 4 0 14 Mar 2024
On the Societal Impact of Open Foundation Models Sayash Kapoor Rishi Bommasani Kevin Klyman Shayne Longpre Ashwin Ramaswami ... Victor Storchan Daniel Zhang Daniel E. Ho Percy Liang Arvind Narayanan 26 54 0 27 Feb 2024
Foundation Model Transparency Reports Rishi Bommasani Kevin Klyman Shayne Longpre Betty Xiong Sayash Kapoor Nestor Maslej Arvind Narayanan Percy Liang 32 15 0 26 Feb 2024
Exploring ChatGPT and its Impact on Society Md. Asraful Haque Shuai Li SILM 25 24 0 21 Feb 2024
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages Junjie Ye Sixian Li Guanyu Li Caishuang Huang Songyang Gao Yilong Wu Qi Zhang Tao Gui Xuanjing Huang LLMAG 25 16 0 16 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review Thilo Hagendorff 21 35 0 13 Feb 2024
On Catastrophic Inheritance of Large Foundation Models Hao Chen Bhiksha Raj Xing Xie Jindong Wang AI4CE 48 12 0 02 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 13 76 0 25 Jan 2024
Visibility into AI Agents Alan Chan Carson Ezell Max Kaufmann K. Wei Lewis Hammond ... Nitarshan Rajkumar David M. Krueger Noam Kolt Lennart Heim Markus Anderljung 13 31 0 23 Jan 2024
A Survey on the Applications of Frontier AI, Foundation Models, and Large Language Models to Intelligent Transportation Systems Mohamed R. Shoaib Heba M. Emara Jun Zhao AI4TS AI4CE LRM LM&MA 17 6 0 12 Jan 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models Alan Chan Ben Bucknall Herbie Bradley David M. Krueger 8 6 0 22 Dec 2023
Aligning Human Intent from Imperfect Demonstrations with Confidence-based Inverse soft-Q Learning Xizhou Bu Wenjuan Li Zhengxiong Liu Zhiqiang Ma Panfeng Huang 20 1 0 18 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models Tian Dong Minhui Xue Guoxing Chen Rayne Holland Shaofeng Li Yan Meng Zhen Liu Haojin Zhu AAML 18 9 0 01 Dec 2023
Deepfakes, Misinformation, and Disinformation in the Era of Frontier AI, Generative AI, and Large AI Models Mohamed R. Shoaib Ze Wang Milad Taleby Ahvanooey Jun Zhao 17 38 0 29 Nov 2023
Towards Responsible Governance of Biological Design Tools Richard Moulange Max Langenkamp Tessa Alexanian Samuel Curtis Morgan Livingston ELM SILM 21 2 0 27 Nov 2023
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework Markus Anderljung Everett Thornton Smith Joe O'Brien Lisa Soder Ben Bucknall Emma Bluemke Jonas Schuett Robert F. Trager Lacey Strahm Rumman Chowdhury 19 16 0 15 Nov 2023
Market Concentration Implications of Foundation Models Jai Vipra Anton Korinek ELM 29 16 0 02 Nov 2023
Contextual Confidence and Generative AI Shrey Jain Zoe Hitzig Pamela Mishkin 25 5 0 02 Nov 2023
Managing extreme AI risks amid rapid progress Yoshua Bengio Geoffrey Hinton Andrew Yao Dawn Song Pieter Abbeel ... Philip H. S. Torr Stuart J. Russell Daniel Kahneman J. Brauner Sören Mindermann 24 63 0 26 Oct 2023
Multinational AGI Consortium (MAGIC): A Proposal for International Coordination on AI Jason Hausenloy Andrea Miotti Claire Dennis 10 1 0 13 Oct 2023
Welfare Diplomacy: Benchmarking Language Model Cooperation Gabriel Mukobi Hannah Erlebach Niklas Lauffer Lewis Hammond Alan Chan Jesse Clifton LM&Ro 15 13 0 13 Oct 2023
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives Elizabeth Seger Noemi Dreksler Richard Moulange Emily Dardaman Jonas Schuett ... Emma Bluemke Michael Aird Patrick Levermore Julian Hazell Abhishek Gupta 11 40 0 29 Sep 2023