v1v2 (latest)

Dual-View Visual Contextualization for Web Navigation

6 February 2024

Wei-Lun Chao

ArXiv (abs)PDF HTML Github (971★)

Papers citing "Dual-View Visual Contextualization for Web Navigation"

13 / 13 papers shown

Fundamentals of Building Autonomous LLM Agents

Victor de Lamo Castrillo

270

10 Oct 2025

Watch and Learn: Learning to Use Computers from Online Videos

271

06 Oct 2025

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

...

230

22 Aug 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

...

382

06 Aug 2025

Turbocharging Web Automation: The Impact of Compressed History StatesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

294

28 Jul 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

...

568

22 May 2025

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

...

837

30 Mar 2025

GUI-Xplore: Empowering Generalizable GUI Agents with One ExplorationComputer Vision and Pattern Recognition (CVPR), 2025

415

22 Mar 2025

SpiritSight Agent: Advanced GUI Agent with One LookComputer Vision and Pattern Recognition (CVPR), 2025

539

05 Mar 2025

Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-TuningAAAI Conference on Artificial Intelligence (AAAI), 2024

347

14 Dec 2024

The BrowserGym Ecosystem for Web Agent Research

Thibault Le Sellier De Chezelles

...

2.1K

06 Dec 2024

MMInA: Benchmarking Multihop Multimodal Internet Agents

421

15 Apr 2024

Tur[k]ingBench: A Challenge Benchmark for Web Agents

Kate Sanders

Adam Byerly

Jingyu Zhang

Benjamin Van Durme

Daniel Khashabi

LLMAG

653

18 Mar 2024