Title |
---|
![]() Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning Jared Joselowitz Arjun Jagota Satyapriya Krishna Sonali Parbhoo Nyal Patel Satyapriya Krishna Sonali Parbhoo |
![]() Permissive Information-Flow Analysis for Large Language Models Shoaib Ahmed Siddiqui Radhika Gaonkar Boris Köpf David M. Krueger Andrew J. Paverd Ahmed Salem Shruti Tople Lukas Wutschitz Menglin Xia Santiago Zanella Béguelin |