Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.06820
Cited By
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
10 September 2024
Ilya Gusev
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation"
2 / 2 papers shown
Title
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Ivan Sviridov
Amina Miftakhova
Artemiy Tereshchenko
Galina Zubkova
Pavel Blinov
Andrey Savchenko
LM&MA
19
0
0
26 Mar 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
45
1
0
25 Feb 2025
1