Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.08638
Cited By
v1
v2
v3 (latest)
TRAIL: Trace Reasoning and Agentic Issue Localization
13 May 2025
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (6 upvotes)
Papers citing
"TRAIL: Trace Reasoning and Agentic Issue Localization"
24 / 24 papers shown
Title
Repairing Tool Calls Using Post-tool Execution Reflection and RAG
Jason Tsay
Zidane Wright
Gaodan Fang
Kiran Kate
Saurabh Jha
Yara Rizk
84
0
0
17 Oct 2025
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Allison Sihan Jia
Daniel Huang
Nikhil Vytla
Nirvika Choudhury
John C. Mitchell
Anupam Datta
98
0
0
09 Oct 2025
Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis
Yu Ge
Linna Xie
Zhong Li
Yu Pei
Tian Zhang
148
4
0
17 Sep 2025
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Fanqi Kong
Ruijie Zhang
Huaxiao Yin
Guibin Zhang
X. Zhang
Ziang Chen
Zhaowei Zhang
Xiaoyuan Zhang
Song-Chun Zhu
Xue Feng
AAML
272
1
0
17 Sep 2025
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Jiawei Shen
Jingjiang Liu
Yidan Liang
117
1
0
10 Sep 2025
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu
Spencer Hong
Jingyu Wu
Kushal Chawla
Charlotte Tang
Youbing Yin
Nathan Wolfe
Erin Babinsky
Daben Liu
116
0
0
08 Sep 2025
When Agents go Astray: Course-Correcting SWE Agents with PRMs
Shubham Gandhi
Jason Tsay
Jatin Ganhotra
Kiran Kate
Yara Rizk
100
4
0
02 Sep 2025
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Dany Moshkovich
Sergey Zeltyn
66
4
0
15 Jul 2025
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
312
90
0
09 May 2025
Synergizing RAG and Reasoning: A Systematic Review
Yunfan Gao
Yun Xiong
Yijie Zhong
Yuxi Bi
Ming Xue
Haoyu Wang
LRM
AI4CE
971
23
0
22 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq Joty
ELM
ALM
LRM
396
20
0
21 Apr 2025
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Zixuan Ke
Fangkai Jiao
Yifei Ming
Xuan-Phi Nguyen
Austin Xu
...
Chengwei Qin
Peifeng Wang
Siyang Song
Caiming Xiong
Shafiq Joty
LRM
322
20
0
12 Apr 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
243
4
0
24 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
425
64
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq Joty
ELM
320
13
0
19 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
551
163
0
17 Mar 2025
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems
Dany Moshkovich
Hadar Mulian
Sergey Zeltyn
Natti Eder
Inna Skarbovsky
Roy Abitbol
147
7
0
09 Mar 2025
Interactive Debugging and Steering of Multi-Agent AI Systems
International Conference on Human Factors in Computing Systems (CHI), 2025
Will Epperson
Gagan Bansal
Victor C. Dibia
Adam Fourney
Jack Gerrits
Erkang Zhu
Saleema Amershi
231
29
0
03 Mar 2025
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
Yutong Wang
Pengliang Ji
Chaoqun Yang
Kaixin Li
Ming Hu
Jiaoyang Li
Guillaume Sartoretti
LRM
ELM
189
12
0
18 Feb 2025
Do LLMs estimate uncertainty well in instruction-following?
International Conference on Learning Representations (ICLR), 2024
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
317
12
0
18 Oct 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
429
60
0
23 Aug 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Knowledge Discovery and Data Mining (KDD), 2024
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&Ro
LLMAG
560
36
0
01 Aug 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
387
69
0
09 Jun 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRM
LM&Ro
518
94
0
08 May 2024
1