Neighbor communities
0 / 0 papers shown
Title |
|---|
Top Contributors
| Name | # Papers | # Citations |
|---|---|---|
Social Events
| Date | Location | Event |
|---|---|---|
Title |
|---|
| Name | # Papers | # Citations |
|---|---|---|
| Date | Location | Event |
|---|---|---|
The community introduces new metrics, methodologies, or frameworks for evaluating language models.
Title |
|---|
Title | |||
|---|---|---|---|
![]() LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal Michał Karp Anna Kubaszewska Magdalena Król Robert Król Aleksander Smywiński-Pohl Mateusz Szymański Witold Wydmański | |||
![]() From Model to Breach: Towards Actionable LLM-Generated Vulnerabilities Reporting Cyril Vallez Alexander Sternfeld Andrei Kucharavy Ljiljana Dolamic | |||
![]() GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Jian Mu Chaoyun Zhang Chiming Ni Lu Wang Bo Qiao ...Yu Kang Minghua Ma Qingwei Lin Saravan Rajmohan Dongmei Zhang | |||
![]() Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges Hyo Jin Do Zahra Ashktorab Jasmina Gajcin Erik Miehling Martín Santillán Cooper Qian Pan Elizabeth M. Daly Werner Geyer | |||
![]() Expert Evaluation of LLM World Models: A High- Superconductivity Case Study Haoyu Guo Maria Tikhanovskaya Paul Raccuglia Alexey Vlaskin Chris Co ...T. Senthil J. M. Tranquada Michael P. Brenner Subhashini Venugopalan Eun-Ah Kim | |||
| Name (-) |
|---|
| Name (-) |
|---|
| Name (-) |
|---|
| Date | Location | Event | |
|---|---|---|---|
| No social events available | |||