Title |
---|
![]() LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs Yujun Zhou Jingdong Yang Kehan Guo Pin-Yu Chen Tian Gao ...Tian Gao Werner Geyer Nuno Moniz Nitesh V Chawla Xiangliang Zhang |
![]() InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation Gaurav Sahu Abhay Puri Juan A. Rodriguez Alexandre Drouin Perouz Taslakian ...Christopher Pal Nicolas Chapados I. Laradji Sai Rajeswar Mudumba Issam Hadj Laradji |
![]() Scaling Large Language Model-based Multi-Agent Collaboration Chen Qian Zihao Xie YiFei Wang Wei Liu Yufan Dang ...Zhuoyun Du Weize Chen Cheng Yang Zhiyuan Liu Maosong Sun |