Learning to Navigate the Web

21 December 2018

Papers citing "Learning to Navigate the Web"

48 / 48 papers shown

Title
ALLOY: Generating Reusable Agent Workflows from User Demonstration Jiawen Li Zheng Ning Yuan Tian Toby Jia-Jun Li LLMAG 100 0 0 11 Oct 2025
TextOnly: A Unified Function Portal for Text-Related Functions on Smartphones Minghao Tu Chun Yu Xiyuan Shen Zhi Zheng Li Chen Yuanchun Shi 88 0 0 23 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use Xueyu Hu Tao Xiong Biao Yi Zishu Wei Ruixuan Xiao ... Zhou Zhao Hongxia Yang Fan Wu Shengyu Zhang Fei Wu LLMAG LM&Ro AI4TS 230 29 0 06 Aug 2025
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Xing Han Lù Amirhossein Kazemnejad Nicholas Meade Arkil Patel Dongchan Shin Alejandra Zambrano Karolina Stañczak Peter Shaw Christopher Pal Siva Reddy LLMAG 343 17 0 11 Apr 2025
Inducing Programmatic Skills for Agentic Tasks Zora Z. Wang Apurva Gandhi Graham Neubig Daniel Fried LLMAG 375 19 0 09 Apr 2025
A2Perf: Real-World Autonomous Agents Benchmark Ikechukwu Uchendu Jason J. Jabbour Korneel Van den Berghe Joel Runevic Matthew P. Stewart ... S. Guadarrama Jie Tan Jordan K. Terry Aleksandra Faust Vijay Janapa Reddi 249 1 0 04 Mar 2025
AgentStudio: A Toolkit for Building General Virtual AgentsInternational Conference on Learning Representations (ICLR), 2024 Longtao Zheng Zhiyuan Huang Zhenghai Xue Xinrun Wang Bo An Shuicheng Yan 436 34 0 17 Feb 2025
RWKV-UI: UI Understanding with Enhanced Perception and Reasoning Jiaxi Yang Haowen Hou ReLM LRM 123 0 0 06 Feb 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web Hiroki Furuta Yutaka Matsuo Aleksandra Faust Izzeddin Gur CLL 591 19 0 03 Jan 2025
The BrowserGym Ecosystem for Web Agent Research Thibault Le Sellier De Chezelles Maxime Gasse Alexandre Lacoste Alexandre Drouin Massimo Caccia ... Siva Reddy Quentin Cappart Graham Neubig Ruslan Salakhutdinov Nicolas Chapados LLMAG 1.9K 62 0 06 Dec 2024
GUI Agents with Foundation Models: A Comprehensive Survey Shuai Wang Wen Liu Jingxuan Chen Weinan Gan Xingshan Zeng ... Bin Wang Chuhan Wu Yasheng Wang Ruiming Tang Jianye Hao LLMAG 458 70 0 07 Nov 2024
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data Xuetian Chen Hangcheng Li Jiaqing Liang Sihang Jiang Deqing Yang LLMAG 444 7 0 25 Oct 2024
From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating UI Operation ImpactsInternational Conference on Intelligent User Interfaces (IUI), 2024 Zhuohao Jerry Zhang E. Schoop Jeffrey Nichols Anuj Mahajan Amanda Swearngin LLMAG 264 1 0 11 Oct 2024
TinyClick: Single-Turn Agent for Empowering GUI Automation Pawel Pawlowski Krystian Zawistowski Wojciech Lapacz Marcin Skorupa Adam Wiacek Sebastien Postansque Jakub Hoscilowicz LRM LLMAG MLLM 379 9 0 09 Oct 2024
NaviQAte: Functionality-Guided Web Application Navigation M. Shahbandeh Parsa Alian Noor Nashid Ali Mesbah 225 8 0 16 Sep 2024
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Ori Yoran S. Amouyal Chaitanya Malaviya Ben Bogin Ofir Press Jonathan Berant LLMAG 342 71 0 22 Jul 2024
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents Luyuan Wang Yongyu Deng Yiwei Zha Guodong Mao Qinmin Wang Tianchen Min Wei Chen Shoufa Chen LLMAG 188 44 0 12 Jun 2024
Benchmarking Mobile Device Control Agents across Diverse Configurations Juyong Lee Taywon Min Minyong An Dongyoon Hahm Kimin Lee Changyeon Kim Kimin Lee 328 29 0 25 Apr 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Keen You Haotian Zhang E. Schoop Floris Weers Amanda Swearngin Jeffrey Nichols Yinfei Yang Zhe Gan MLLM 341 146 0 08 Apr 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents Kevin Xu Yeganeh Kordi Kate Sanders Yizhong Wang Adam Byerly Kate Sanders Adam Byerly Jingyu Zhang Benjamin Van Durme Daniel Khashabi LLMAG 491 16 0 18 Mar 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Raghav Kapoor Y. Butala M. Russak Jing Yu Koh Kiran Kamble Waseem Alshikh Ruslan Salakhutdinov LLMAG 471 103 0 27 Feb 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Kanzhi Cheng Qiushi Sun Yougang Chu Fangzhi Xu Yantao Li Jianbing Zhang Zhiyong Wu LLMAG 663 344 0 17 Jan 2024
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation Difei Gao Lei Ji Zechen Bai Mingyu Ouyang Peiran Li ... Peiyi Wang Xiangwu Guo Hengxu Wang Luowei Zhou Mike Zheng Shou LLMAG 297 34 0 20 Dec 2023
UINav: A Practical Approach to Train On-Device Automation AgentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 Wei Li Fu-Lin Hsu Will Bishop Folawiyo Campbell-Ajala Max Lin Oriana Riva 511 4 0 15 Dec 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API Zhizheng Zhang Wenxuan Xie Xiaoyi Zhang Yan Lu 189 16 0 07 Oct 2023
A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023 Izzeddin Gur Hiroki Furuta Austin Huang Mustafa Safdari Yutaka Matsuo Douglas Eck Aleksandra Faust LM&Ro LLMAG 550 307 0 24 Jul 2023
Android in the Wild: A Large-Scale Dataset for Android Device ControlNeural Information Processing Systems (NeurIPS), 2023 Christopher Rawles Alice Li Daniel Rodriguez Oriana Riva Timothy Lillicrap LM&Ro 396 249 0 19 Jul 2023
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer ControlInternational Conference on Learning Representations (ICLR), 2023 Longtao Zheng Rongpin Wang Xinrun Wang Bo An LLMAG 356 97 0 13 Jun 2023
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023 Peter Shaw Mandar Joshi James Cohan Jonathan Berant Panupong Pasupat Hexiang Hu Urvashi Khandelwal Kenton Lee Kristina Toutanova LLMAG LM&Ro 253 74 0 31 May 2023
Towards Cognitive Bots: Architectural Research ChallengesArtificial General Intelligence (AGI), 2023 Habtom Kahsay Gidey Peter Hillmann A. Karcher Alois Knoll 107 7 0 26 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023 Hiroki Furuta Kuang-Huei Lee Ofir Nachum Yutaka Matsuo Aleksandra Faust S. Gu Izzeddin Gur LM&Ro 393 140 0 19 May 2023
A Suite of Generative Tasks for Multi-Level Multimodal Webpage UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Andrea Burns Krishna Srinivasan Joshua Ainslie Geoff Brown Bryan A. Plummer Kate Saenko Jianmo Ni Mandy Guo 3DV 204 15 0 05 May 2023
Language Models can Solve Computer TasksNeural Information Processing Systems (NeurIPS), 2023 Geunwoo Kim Pierre Baldi Alexander Shmakov LLMAG LM&Ro 522 459 0 30 Mar 2023
Augmented Language Models: a Survey Grégoire Mialon Roberto Dessì Maria Lomeli Christoforos Nalmpantis Ramakanth Pasunuru ... Jane Dwivedi-Yu Asli Celikyilmaz Edouard Grave Yann LeCun Thomas Scialom LRM KELM 254 482 0 15 Feb 2023
Lexi: Self-Supervised Learning of the UI LanguageConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Pratyay Banerjee Shweti Mahajan Kushal Arora Chitta Baral Oriana Riva 105 18 0 23 Jan 2023
Understanding HTML with Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Izzeddin Gur Ofir Nachum Yingjie Miao Mustafa Safdari Austin Huang Aakanksha Chowdhery Sharan Narang Noah Fiedel Aleksandra Faust AI4CE 460 82 0 08 Oct 2022
MUG: Interactive Multimodal Grounding on User InterfacesFindings (Findings), 2022 Tao Li Gang Li Jingjie Zheng Purple Wang Yang Li LLMAG 174 10 0 29 Sep 2022
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsNeural Information Processing Systems (NeurIPS), 2022 Shunyu Yao Howard Chen John Yang Karthik Narasimhan LLMAG LM&Ro 763 740 0 04 Jul 2022
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task GeneralizationConference on Uncertainty in Artificial Intelligence (UAI), 2022 Sungryull Sohn Hyunjae Woo Jongwook Choi lyubing qiang Izzeddin Gur Aleksandra Faust Honglak Lee BDL OffRL 222 3 0 25 May 2022
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs Taichi Iki Akiko Aizawa LLMAG 175 6 0 15 Mar 2022
A data-driven approach for learning to control computersInternational Conference on Machine Learning (ICML), 2022 Peter C. Humphreys David Raposo Tobias Pohlen Gregory Thornton Rachita Chhaparia ... Josh Abramson Petko Georgiev Alex Goldin Adam Santoro Timothy Lillicrap 311 115 0 16 Feb 2022
Environment Generation for Zero-Shot Compositional Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022 Izzeddin Gur Natasha Jaques Yingjie Miao Jongwook Choi Manoj Kumar Tiwari Honglak Lee Aleksandra Faust 242 45 0 21 Jan 2022
WebGPT: Browser-assisted question-answering with human feedback Reiichiro Nakano Jacob Hilton S. Balaji Jeff Wu Ouyang Long ... Gretchen Krueger Kevin Button Matthew Knight B. Chess John Schulman ALM RALM 458 1,601 0 17 Dec 2021
Learning UI Navigation through Demonstrations composed of Macro Actions Wei Li LLMAG 133 9 0 16 Oct 2021
AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning Maayan Shvo Zhiming Hu Rodrigo Toro Icarte Iqbal Mohomed A. Jepson Sheila A. McIlraith 176 16 0 31 May 2021
Adversarial Environment Generation for Learning to Navigate the Web Izzeddin Gur Natasha Jaques Kevin Malta Manoj Kumar Tiwari Honglak Lee Aleksandra Faust 215 18 0 02 Mar 2021
Rapid Task-Solving in Novel Environments Samuel Ritter Ryan Faulkner Laurent Sartran Adam Santoro M. Botvinick David Raposo 163 30 0 05 Jun 2020
Evolving Rewards to Automate Reinforcement Learning Aleksandra Faust Anthony G. Francis Dar Mehta 200 52 0 18 May 2019