ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01502
  4. Cited By
AI Agents That Matter

AI Agents That Matter

1 July 2024
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
ArXivPDFHTML

Papers citing "AI Agents That Matter"

27 / 27 papers shown
Title
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Mehmet Hamza Erol
Batu El
Mirac Suzgun
Mert Yuksekgonul
J. Zou
ELM
31
0
0
17 Apr 2025
Evaluating the Goal-Directedness of Large Language Models
Evaluating the Goal-Directedness of Large Language Models
Tom Everitt
Cristina Garbacea
Alexis Bellot
Jonathan G. Richens
Henry Papadatos
Simeon Campos
Rohin Shah
ELM
LM&MA
LM&Ro
LRM
68
0
0
16 Apr 2025
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
Yunxiang Zhang
Muhammad Khalifa
Shitanshu Bhushan
Grant D Murphy
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LLMAG
ELM
62
0
0
13 Apr 2025
Attention-Aware Multi-View Pedestrian Tracking
Attention-Aware Multi-View Pedestrian Tracking
Reef Alturki
Adrian Hilton
Jean-Yves Guillemaut
28
0
0
03 Apr 2025
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
Tobias Gessler
Tin Dizdarevic
Ani Calinescu
Benjamin Ellis
Andrei Lupu
Jakob Foerster
49
0
0
22 Mar 2025
Multi-Agent Systems Execute Arbitrary Malicious Code
Multi-Agent Systems Execute Arbitrary Malicious Code
Harold Triedman
Rishi Jha
Vitaly Shmatikov
LLMAG
AAML
89
2
0
15 Mar 2025
Evaluating the Process Modeling Abilities of Large Language Models -- Preliminary Foundations and Results
Evaluating the Process Modeling Abilities of Large Language Models -- Preliminary Foundations and Results
Peter Fettke
Constantin Houy
ELM
35
0
0
14 Mar 2025
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Mourad Gridach
Jay Nanavati
Khaldoun Zine El Abidine
Lenon Mendes
Christina Mack
48
3
0
12 Mar 2025
Measuring AI agent autonomy: Towards a scalable approach with code inspection
Measuring AI agent autonomy: Towards a scalable approach with code inspection
Peter Cihon
Merlin Stein
Gagan Bansal
Sam Manning
Kevin Xu
26
0
0
21 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
66
4
0
03 Feb 2025
Cocoa: Co-Planning and Co-Execution with AI Agents
Cocoa: Co-Planning and Co-Execution with AI Agents
K. J. Kevin Feng
Kevin Pu
Matt Latzke
Tal August
Pao Siangliulue
Jonathan Bragg
Daniel S. Weld
Amy X. Zhang
Joseph Chee Chang
LM&Ro
LLMAG
87
4
0
14 Dec 2024
Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation
  for Enterprise Applications
Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications
Raphael Shu
Nilaksh Das
Michelle Yuan
Monica Sunkara
Yi Zhang
LLMAG
69
2
0
06 Dec 2024
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
70
1
0
29 Nov 2024
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect
  Verifiers
Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Benedikt Stroebl
Sayash Kapoor
Arvind Narayanan
LRM
82
6
0
26 Nov 2024
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness
  in Web Agents
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Ido Levy
Ben wiesel
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
LLMAG
29
13
0
09 Oct 2024
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
  Learning
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring
Kunhao Zheng
Jade Copet
Vegard Mella
Taco Cohen
Gabriel Synnaeve
LLMAG
27
20
0
02 Oct 2024
SEAL: Suite for Evaluating API-use of LLMs
SEAL: Suite for Evaluating API-use of LLMs
Woojeong Kim
Ashish Jagmohan
Aditya Vempaty
ELM
ALM
LLMAG
30
0
0
23 Sep 2024
CORE-Bench: Fostering the Credibility of Published Research Through a
  Computational Reproducibility Agent Benchmark
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
27
8
0
17 Sep 2024
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text
  and Protein Sequences for Protein Engineering
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering
Yiqing Shen
Zan Chen
Michail Mamalakis
Yungeng Liu
Tianbin Li
Yanzhou Su
Junjun He
Pietro Liò
Yu Guang Wang
LLMAG
30
8
0
27 Aug 2024
Agent-E: From Autonomous Web Navigation to Foundational Design
  Principles in Agentic Systems
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems
Tamer Abuelsaad
Deepak Akkil
Prasenjit Dey
Ashish Jagmohan
Aditya Vempaty
Ravi Kokku
39
23
0
17 Jul 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
42
2
0
17 Jul 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
33
45
0
24 Jun 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software
  Engineering
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang
Carlos E. Jimenez
Alexander Wettig
K. Lieret
Shunyu Yao
Karthik Narasimhan
Ofir Press
LLMAG
99
188
0
06 May 2024
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work
  Tasks?
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin
Maxime Gasse
Massimo Caccia
I. Laradji
Manuel Del Verme
...
Megh Thakkar
Quentin Cappart
David Vazquez
Nicolas Chapados
Alexandre Lacoste
LLMAG
48
51
0
12 Mar 2024
More Agents Is All You Need
More Agents Is All You Need
Junyou Li
Qin Zhang
Yangbin Yu
Qiang Fu
Deheng Ye
LLMAG
133
57
0
03 Feb 2024
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Subbarao Kambhampati
Karthik Valmeekam
L. Guan
Mudit Verma
Kaya Stechly
Siddhant Bhambri
Lucas Saldyt
Anil Murthy
LRM
78
107
0
02 Feb 2024
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
125
137
0
19 Sep 2023
1