ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11854
  4. Cited By
Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

19 May 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
    LM&Ro
ArXivPDFHTML

Papers citing "Multimodal Web Navigation with Instruction-Finetuned Foundation Models"

50 / 86 papers shown
Title
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Harshvardhan Sikka
LM&Ro
VLM
91
0
0
08 May 2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng
Weihao Tan
Zhiyi Lyu
Longtao Zheng
Haiyang Xu
M. Yan
Fei Huang
Bo An
22
0
0
01 May 2025
Explorer: Robust Collection of Interactable GUI Elements
Explorer: Robust Collection of Interactable GUI Elements
Iason Chaimalas
Arnas Vyšniauskas
Gabriel Brostow
21
0
0
12 Apr 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Boyuan Zheng
Michael Y. Fatemi
Xiaolong Jin
Z. Wang
Apurva Gandhi
...
Yu Gu
Jayanth Srinivasa
Gaowen Liu
Graham Neubig
Yu Su
CLL
34
0
0
09 Apr 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
92
4
1
02 Apr 2025
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
Liangbo Ning
Ziran Liang
Zhuohang Jiang
Haohao Qu
Yujuan Ding
...
Xiao Wei
Shanru Lin
Hui Liu
Philip S. Yu
Qing Li
LLMAG
LM&Ro
91
5
0
30 Mar 2025
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun
Shanhui Zhao
Tao Yu
Hao Wen
Samith Va
Mengwei Xu
Yuanchun Li
Chongyang Zhang
LLMAG
62
0
0
22 Mar 2025
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Dongjun Lee
Juyong Lee
Kyuyoung Kim
Jihoon Tack
Jinwoo Shin
Yee Whye Teh
Kimin Lee
LLMAG
60
2
0
12 Mar 2025
Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions
Mourad Gridach
Jay Nanavati
Khaldoun Zine El Abidine
Lenon Mendes
Christina Mack
48
3
0
12 Mar 2025
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
Lutfi Eren Erdogan
Nicholas Lee
Sehoon Kim
Suhong Moon
Hiroki Furuta
Gopala Anumanchipalli
K. K.
Amir Gholami
LLMAG
LM&Ro
AIFin
76
2
0
12 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Y. Li
LM&Ro
LM&MA
87
1
0
07 Mar 2025
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lù
Alejandra Zambrano
Arkil Patel
Esin Durmus
Spandana Gella
Karolina Stañczak
Siva Reddy
LLMAG
ELM
85
2
0
06 Mar 2025
SpiritSight Agent: Advanced GUI Agent with One Look
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang
Ziming Cheng
Junting Pan
Zhaohui Hou
Mingjie Zhan
LLMAG
99
2
0
05 Mar 2025
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
Wenjia Jiang
Yangyang Zhuang
Chenxi Song
Xu Yang
Chi Zhang
Chi Zhang
LLMAG
84
1
0
04 Mar 2025
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes
Alan Ritter
LRM
57
3
0
04 Mar 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
107
9
0
18 Feb 2025
WebWalker: Benchmarking LLMs in Web Traversal
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu
Wenbiao Yin
Yong-feng Jiang
Zhenglin Wang
Zekun Xi
...
Linhai Zhang
Yulan He
Deyu Zhou
Pengjun Xie
Fei Huang
41
5
0
13 Jan 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Hiroki Furuta
Yutaka Matsuo
Aleksandra Faust
Izzeddin Gur
CLL
85
13
0
03 Jan 2025
EscapeBench: Pushing Language Models to Think Outside the Box
EscapeBench: Pushing Language Models to Think Outside the Box
Cheng Qian
Peixuan Han
Qinyu Luo
Bingxiang He
X. Chen
...
Jiarui Yao
Xiaocheng Yang
Denghui Zhang
Yunzhu Li
Heng Ji
LLMAG
LRM
82
3
0
18 Dec 2024
The BrowserGym Ecosystem for Web Agent Research
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier De Chezelles
Maxime Gasse
Alexandre Lacoste
Alexandre Drouin
Massimo Caccia
...
Siva Reddy
Quentin Cappart
Graham Neubig
Ruslan Salakhutdinov
Nicolas Chapados
LLMAG
98
9
0
06 Dec 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Z. Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
72
13
0
26 Nov 2024
GUI Agents with Foundation Models: A Comprehensive Survey
GUI Agents with Foundation Models: A Comprehensive Survey
Shuai Wang
W. Liu
Jingxuan Chen
Weinan Gan
Xingshan Zeng
...
Bin Wang
Chuhan Wu
Yasheng Wang
Ruiming Tang
Jianye Hao
LLMAG
68
12
0
07 Nov 2024
EDGE: Enhanced Grounded GUI Understanding with Enriched
  Multi-Granularity Synthetic Data
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data
Xuetian Chen
Hangcheng Li
Jiaqing Liang
Sihang Jiang
Deqing Yang
LLMAG
46
2
0
25 Oct 2024
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Jang
Yinheng Li
Charles Ding
Justin Lin
Paul Pu Liang
Dan Zhao
Rogerio Bonatti
K. Koishida
33
5
0
24 Oct 2024
Lightweight Neural App Control
Lightweight Neural App Control
Filippos Christianos
Georgios Papoudakis
Thomas Coste
Jianye Hao
Jun Wang
Kun Shao
LM&Ro
52
4
0
23 Oct 2024
Large Language Models Empowered Personalized Web Agents
Large Language Models Empowered Personalized Web Agents
Hongru Cai
Yongqi Li
W. Wang
Fengbin Zhu
Xiaoyu Shen
Wenjie Li
Tat-Seng Chua
LLMAG
43
12
0
22 Oct 2024
Synatra: Turning Indirect Knowledge into Direct Demonstrations for
  Digital Agents at Scale
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
Tianyue Ou
Frank F. Xu
Aman Madaan
J. Liu
Robert Lo
Abishek Sridhar
Sudipta Sengupta
Dan Roth
Graham Neubig
Shuyan Zhou
OffRL
31
9
0
24 Sep 2024
Steward: Natural Language Web Automation
Steward: Natural Language Web Automation
Brian Tang
Kang G. Shin
LLMAG
29
1
0
23 Sep 2024
NaviQAte: Functionality-Guided Web Application Navigation
NaviQAte: Functionality-Guided Web Application Navigation
M. Shahbandeh
Parsa Alian
Noor Nashid
Ali Mesbah
18
2
0
16 Sep 2024
OmniParser for Pure Vision Based GUI Agent
OmniParser for Pure Vision Based GUI Agent
Yadong Lu
Jianwei Yang
Yelong Shen
Ahmed Hassan Awadallah
MLLM
27
33
0
01 Aug 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?
Ori Yoran
S. Amouyal
Chaitanya Malaviya
Ben Bogin
Ofir Press
Jonathan Berant
LLMAG
35
31
0
22 Jul 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Shilong Liu
Bochen Qian
Philip H. S. Torr
Bernard Ghanem
G. Li
38
14
0
01 Jul 2024
Tree Search for Language Model Agents
Tree Search for Language Model Agents
Jing Yu Koh
Stephen Marcus McAleer
Daniel Fried
Ruslan Salakhutdinov
LM&Ro
LLMAG
LRM
51
56
0
01 Jul 2024
Identifying User Goals from UI Trajectories
Identifying User Goals from UI Trajectories
Omri Berkovitch
Sapir Caduri
Noam Kahlon
Anatoly Efros
Avi Caciularu
Ido Dagan
LLMAG
27
4
0
20 Jun 2024
WebCanvas: Benchmarking Web Agents in Online Environments
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan
Dehan Kong
Sida Zhou
Cheng Cui
Yifei Leng
...
Hangyu Liu
Yanyi Shang
Shuyan Zhou
Tongshuang Wu
Zhengyang Wu
24
26
0
18 Jun 2024
KAOS: Large Model Multi-Agent Operating System
KAOS: Large Model Multi-Agent Operating System
Zhao Zhuo
Rongzhen Li
Kai Liu
Huhai Zou
KaiMao Li
Jie Yu
Tianhao Sun
Qingbo Wu
VLM
LLMAG
36
1
0
17 Jun 2024
GUICourse: From General Vision Language Models to Versatile GUI Agents
GUICourse: From General Vision Language Models to Versatile GUI Agents
Wentong Chen
Junbo Cui
Jinyi Hu
Yujia Qin
Junjie Fang
...
Yupeng Huo
Yuan Yao
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
31
30
0
17 Jun 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin
Linjie Li
Difei Gao
Qinchen Wu
Mingyi Yan
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
39
10
0
14 Jun 2024
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks
  with Front-End UI Only
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only
Junhee Cho
Jihoon Kim
Daseul Bae
Jinho Choo
Youngjune Gwon
Yeong-Dae Kwon
LLMAG
29
1
0
11 Jun 2024
Adaptive In-conversation Team Building for Language Model Agents
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
63
10
0
29 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
62
44
0
23 May 2024
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Michael Lutz
Arth Bohra
Manvel Saroyan
Artem Harutyunyan
Giovanni Campagna
LLMAG
30
13
0
08 Apr 2024
Your Co-Workers Matter: Evaluating Collaborative Capabilities of
  Language Models in Blocks World
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
Guande Wu
Chen Zhao
Claudio Silva
He He
LLMAG
16
4
0
30 Mar 2024
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy
Zonghan Yang
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Janet Liu
LLMAG
LRM
44
8
0
21 Mar 2024
BAGEL: Bootstrapping Agents by Guiding Exploration with Language
BAGEL: Bootstrapping Agents by Guiding Exploration with Language
Shikhar Murty
Christopher D. Manning
Peter Shaw
Mandar Joshi
Kenton Lee
LM&Ro
LLMAG
26
14
0
12 Mar 2024
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work
  Tasks?
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin
Maxime Gasse
Massimo Caccia
I. Laradji
Manuel Del Verme
...
Megh Thakkar
Quentin Cappart
David Vazquez
Nicolas Chapados
Alexandre Lacoste
LLMAG
51
53
0
12 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
26
3
0
10 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
51
44
0
27 Feb 2024
WIPI: A New Web Threat for LLM-Driven Web Agents
WIPI: A New Web Threat for LLM-Driven Web Agents
Fangzhou Wu
Shutong Wu
Yulong Cao
Chaowei Xiao
LLMAG
32
17
0
26 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
41
14
0
15 Feb 2024
12
Next