Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.01622
Cited By
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
2 February 2024
Jian Xie
Kai Zhang
Jiangjie Chen
Tinghui Zhu
Renze Lou
Yuandong Tian
Yanghua Xiao
Yu-Chuan Su
LLMAG
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TravelPlanner: A Benchmark for Real-World Planning with Language Agents"
50 / 103 papers shown
Title
HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking
Runquan Gui
Z. Wang
J. Wang
Chi Ma
Huiling Zhen
M. Yuan
Jianye Hao
Defu Lian
Enhong Chen
Feng Wu
LRM
42
0
0
05 May 2025
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents
Takyoung Kim
Janvijay Singh
Shuhaib Mehri
Emre Can Acikgoz
Sagnik Mukherjee
Nimet Beyza Bozdag
Sumuk Shashidhar
Gökhan Tür
Dilek Hakkani-Tür
LLMAG
25
0
0
02 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
LawFlow : Collecting and Simulating Lawyers' Thought Processes
Debarati Das
Khanh Chi Le
R. Parkar
Karin de Langis
Brendan Madson
...
Robin M. Willis
Daniel H. Moses
Brett McDonnell
Daniel Schwarcz
Dongyeop Kang
AILaw
69
0
0
26 Apr 2025
PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
Haoming Li
Zhaoliang Chen
Jonathan Zhang
Fei Liu
LLMAG
33
0
0
21 Apr 2025
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang
Zhuoqun Hao
Y. Cho
B. Li
Yuan Yuan
Sihao Chen
Lyle Ungar
Camillo J. Taylor
Dan Roth
32
0
0
19 Apr 2025
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents
Dayeon Ki
Tianyi Zhou
Marine Carpuat
Gang Wu
Puneet Mathur
Viswanathan Swaminathan
LLMAG
LM&Ro
48
0
0
15 Apr 2025
SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Query Generation for Personalized Tourism Recommenders
Ashmi Banerjee
Adithi Satish
Fitri Nur Aisyah
Wolfgang Wörndl
Yashar Deldjoo
AI4TS
31
0
0
12 Apr 2025
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
Hang Ni
Fan Liu
Xinyu Ma
Lixin Su
S. Wang
Dawei Yin
Hui Xiong
Hao Liu
LLMAG
AI4TS
52
0
0
11 Apr 2025
Inducing Programmatic Skills for Agentic Tasks
Zora Zhiruo Wang
Apurva Gandhi
Graham Neubig
Daniel Fried
LLMAG
40
0
0
09 Apr 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
Emre Can Acikgoz
Cheng Qian
Hongru Wang
Vardhan Dongre
X. Chen
Heng Ji
Dilek Hakkani-Tür
Gökhan Tür
LM&Ro
ELM
43
1
0
07 Apr 2025
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Peijie Yu
Yifan Yang
J. Li
Zelong Zhang
Haorui Wang
Xiao Feng
Feng Zhang
LLMAG
97
0
0
03 Apr 2025
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
C. Huang
73
0
0
02 Apr 2025
LLMs as Planning Modelers: A Survey for Leveraging Large Language Models to Construct Automated Planning Models
Marcus Tantakoun
Xiaodan Zhu
Christian Muise
36
0
0
22 Mar 2025
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Jian-Yu Guan
J. Wu
J. Li
Chuanqi Cheng
Wei Yu Wu
LM&MA
69
0
0
21 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
48
2
0
19 Mar 2025
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam
Spandan Garg
Jinu Jang
Neel Sundaresan
Roshanak Zilouchian Moghaddam
LLMAG
LRM
62
2
0
10 Mar 2025
DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments
Wenjie Tang
Yuan Zhou
Erqiang Xu
Keyan Cheng
Minne Li
Liquan Xiao
ELM
47
1
0
08 Mar 2025
Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems
Kibum Kim
Sein Kim
Hongseok Kang
Jiwan Kim
Heewoong Noh
Yeonjun In
Kanghoon Yoon
Jinoh Oh
Chanyoung Park
60
0
0
08 Mar 2025
Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions
Zirui Wu
Xiao Liu
Jiayi Li
Lingpeng Kong
Yansong Feng
39
1
0
04 Mar 2025
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains
Wonje Choi
Jinwoo Park
Sanghyun Ahn
Daehee Lee
Honguk Woo
36
1
0
02 Mar 2025
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao
Pengfei Cao
Zhuoran Jin
Huanxuan Liao
Yubo Chen
Kang Liu
Jun Zhao
LLMAG
65
1
0
02 Mar 2025
TripCraft: A Benchmark for Spatio-Temporally Fine Grained Travel Planning
Soumyabrata Chaudhuri
Pranav Purkar
Ritwik Raghav
Shubhojit Mallick
Manish Gupta
Abhik Jana
Shreya Ghosh
AI4TS
29
1
0
27 Feb 2025
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning
Yanan Chen
Ali Pesaranghader
Tanmana Sadhu
LRM
54
0
0
26 Feb 2025
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
Christine P. Lee
David J. Porfirio
Xinyu Jessica Wang
Kevin Zhao
Bilge Mutlu
80
1
0
25 Feb 2025
Narrative-Driven Travel Planning: Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization
Ran Ding
Ziyu Zhang
Ying Zhu
Ziqian Kong
Peilan Xu
65
0
0
20 Feb 2025
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Shubham Parashar
Blake Olson
Sambhav Khurana
Eric Li
Hongyi Ling
James Caverlee
Shuiwang Ji
LRM
ReLM
81
8
0
18 Feb 2025
RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents
Weizhe Chen
Sven Koenig
B. Dilkina
LLMAG
94
8
0
17 Feb 2025
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Hui Wei
Zihao Zhang
Shenghua He
Tian Xia
Shijia Pan
Fei Liu
41
3
0
16 Feb 2025
The Philosophical Foundations of Growing AI Like A Child
Dezhi Luo
Yijiang Li
Hokin Deng
ReLM
LRM
39
1
0
15 Feb 2025
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
Kunal Handa
Alex Tamkin
Miles McCain
Saffron Huang
Esin Durmus
...
Kevin K. Troy
Dario Amodei
Jared Kaplan
Jack Clark
Deep Ganguli
MLAU
55
0
0
11 Feb 2025
Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Spatial Reasoning Questions
Dazhou Yu
Riyang Bao
Gengchen Mai
Liang Zhao
ReLM
LRM
43
3
0
04 Feb 2025
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
66
4
0
03 Feb 2025
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Mahir Labib Dihan
Md Tanvir Hassan
Md Tanvir Parvez
Md Hasebul Hasan
Md Almash Alam
Muhammad Aamir Cheema
Mohammed Eunus Ali
Md. Rizwan Parvez
LRM
ELM
22
1
0
03 Jan 2025
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes
Shengxin Hong
Liang Xiao
Xin Zhang
Jian-Xing Chen
LRM
33
2
0
31 Dec 2024
Beyond Partisan Leaning: A Comparative Analysis of Political Bias in Large Language Models
Kaiqi Yang
Hang Li
Yucheng Chu
Hang Li
Tai-Quan Peng
Yuping Lin
Hui Liu
80
1
0
21 Dec 2024
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
82
20
0
20 Dec 2024
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Song Jiang
Da JU
Andrew Cohen
Sasha Mitts
Aaron Foss
Justine T Kao
Xian Li
Yuandong Tian
57
2
0
21 Nov 2024
Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems
Taaha Kazi
Ruiliang Lyu
Sizhe Zhou
Dilek Hakkani-Tür
Gökhan Tür
ELM
LLMAG
21
1
0
15 Nov 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Lingpeng Kong
AI4CE
61
14
0
23 Oct 2024
To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
Da JU
Song Jiang
A. Cohen
Aaron Foss
Sasha Mitts
...
Brandon Amos
Xian Li
Justine T Kao
Maryam Fazel-Zarandi
Yuandong Tian
LLMAG
20
4
0
21 Oct 2024
Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators
Timothy Wei
Annabelle Miin
Anastasia Miin
11
0
0
19 Oct 2024
Revealing the Barriers of Language Agents in Planning
Jian Xie
Kexun Zhang
Jiangjie Chen
Siyu Yuan
Kai Zhang
Yikai Zhang
Lei Li
Yanghua Xiao
LM&Ro
AIFin
LRM
22
1
0
16 Oct 2024
Denial-of-Service Poisoning Attacks against Large Language Models
Kuofeng Gao
Tianyu Pang
Chao Du
Yong Yang
Shu-Tao Xia
Min-Bin Lin
SILM
AAML
47
14
0
14 Oct 2024
ACPBench: Reasoning about Action, Change, and Planning
Harsha Kokel
Michael Katz
Kavitha Srinivas
Shirin Sohrabi
ReLM
LRM
29
0
0
08 Oct 2024
AgentSquare: Automatic LLM Agent Search in Modular Design Space
Yu Shang
Yu Li
Keyu Zhao
Likai Ma
J. Liu
Fengli Xu
Yong Li
LLMAG
28
8
0
08 Oct 2024
JumpStarter: Getting Started on Personal Goals with AI-Powered Context Curation
Sitong Wang
Xuanming Zhang
Jenny Ma
Alyssa Hwang
Lydia B. Chilton
22
0
0
04 Oct 2024
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Hanrong Zhang
Jingyuan Huang
Kai Mei
Yifei Yao
Zhenting Wang
Chenlu Zhan
Hongwei Wang
Yongfeng Zhang
AAML
LLMAG
ELM
42
17
0
03 Oct 2024
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
Yuandong Tian
41
0
0
02 Oct 2024
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li
Zhicheng Sun
Fei Li
74
1
0
02 Oct 2024
1
2
3
Next