ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.08638
  4. Cited By
TRAIL: Trace Reasoning and Agentic Issue Localization
v1v2v3 (latest)

TRAIL: Trace Reasoning and Agentic Issue Localization

13 May 2025
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)

Papers citing "TRAIL: Trace Reasoning and Agentic Issue Localization"

24 / 24 papers shown
Title
Repairing Tool Calls Using Post-tool Execution Reflection and RAG
Repairing Tool Calls Using Post-tool Execution Reflection and RAG
Jason Tsay
Zidane Wright
Gaodan Fang
Kiran Kate
Saurabh Jha
Yara Rizk
84
0
0
17 Oct 2025
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Allison Sihan Jia
Daniel Huang
Nikhil Vytla
Nirvika Choudhury
John C. Mitchell
Anupam Datta
98
0
0
09 Oct 2025
Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis
Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis
Yu Ge
Linna Xie
Zhong Li
Yu Pei
Tian Zhang
148
4
0
17 Sep 2025
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Fanqi Kong
Ruijie Zhang
Huaxiao Yin
Guibin Zhang
X. Zhang
Ziang Chen
Zhaowei Zhang
Xiaoyuan Zhang
Song-Chun Zhu
Xue Feng
AAML
272
1
0
17 Sep 2025
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Jiawei Shen
Jingjiang Liu
Yidan Liang
117
1
0
10 Sep 2025
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu
Spencer Hong
Jingyu Wu
Kushal Chawla
Charlotte Tang
Youbing Yin
Nathan Wolfe
Erin Babinsky
Daben Liu
116
0
0
08 Sep 2025
When Agents go Astray: Course-Correcting SWE Agents with PRMs
When Agents go Astray: Course-Correcting SWE Agents with PRMs
Shubham Gandhi
Jason Tsay
Jatin Ganhotra
Kiran Kate
Yara Rizk
100
4
0
02 Sep 2025
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems
Dany Moshkovich
Sergey Zeltyn
66
4
0
15 Jul 2025
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
312
90
0
09 May 2025
Synergizing RAG and Reasoning: A Systematic Review
Synergizing RAG and Reasoning: A Systematic Review
Yunfan Gao
Yun Xiong
Yijie Zhong
Yuxi Bi
Ming Xue
Haoyu Wang
LRMAI4CE
971
23
0
22 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq Joty
ELMALMLRM
396
20
0
21 Apr 2025
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Zixuan Ke
Fangkai Jiao
Yifei Ming
Xuan-Phi Nguyen
Austin Xu
...
Chengwei Qin
Peifeng Wang
Siyang Song
Caiming Xiong
Shafiq Joty
LRM
322
20
0
12 Apr 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
243
4
0
24 Mar 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAGELM
425
64
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual SettingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq Joty
ELM
320
13
0
19 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
551
163
0
17 Mar 2025
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems
Dany Moshkovich
Hadar Mulian
Sergey Zeltyn
Natti Eder
Inna Skarbovsky
Roy Abitbol
147
7
0
09 Mar 2025
Interactive Debugging and Steering of Multi-Agent AI SystemsInternational Conference on Human Factors in Computing Systems (CHI), 2025
Will Epperson
Gagan Bansal
Victor C. Dibia
Adam Fourney
Jack Gerrits
Erkang Zhu
Saleema Amershi
231
29
0
03 Mar 2025
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
Yutong Wang
Pengliang Ji
Chaoqun Yang
Kaixin Li
Ming Hu
Jiaoyang Li
Guillaume Sartoretti
LRMELM
189
12
0
18 Feb 2025
Do LLMs estimate uncertainty well in instruction-following?
Do LLMs estimate uncertainty well in instruction-following?International Conference on Learning Representations (ICLR), 2024
Juyeon Heo
Miao Xiong
Christina Heinze-Deml
Jaya Narain
ELM
317
12
0
18 Oct 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALMELM
429
60
0
23 Aug 2024
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task GenerationKnowledge Discovery and Data Mining (KDD), 2024
Mengkang Hu
Yixiao Wang
Can Xu
Lingfeng Sun
Chensheng Peng
T. Hannagan
Nicola Poerio
Saravan Rajmohan
LM&RoLLMAG
560
36
0
01 Aug 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
387
69
0
09 Jun 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRMLM&Ro
518
94
0
08 May 2024
1