ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11854
  4. Cited By
Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

19 May 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
    LM&Ro
ArXivPDFHTML

Papers citing "Multimodal Web Navigation with Instruction-Finetuned Foundation Models"

36 / 86 papers shown
Title
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Kuang-Huei Lee
Xinyun Chen
Hiroki Furuta
John F. Canny
Ian S. Fischer
RALM
53
29
0
15 Feb 2024
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù
Zdeněk Kasner
Siva Reddy
22
59
0
08 Feb 2024
Dual-View Visual Contextualization for Web Navigation
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil
Chan Hee Song
Boyuan Zheng
Xiang Deng
Yu-Chuan Su
Wei-Lun Chao
EgoV
22
12
0
06 Feb 2024
WebVoyager: Building an End-to-End Web Agent with Large Multimodal
  Models
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He
Wenlin Yao
Kaixin Ma
Wenhao Yu
Yong Dai
Hongming Zhang
Zhenzhong Lan
Dong Yu
LLMAG
30
121
0
25 Jan 2024
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Kanzhi Cheng
Qiushi Sun
Yougang Chu
Fangzhi Xu
Yantao Li
Jianbing Zhang
Zhiyong Wu
LLMAG
170
138
0
17 Jan 2024
MobileAgent: enhancing mobile control via human-machine interaction and
  SOP integration
MobileAgent: enhancing mobile control via human-machine interaction and SOP integration
Tinghe Ding
LLMAG
LM&Ro
34
6
0
04 Jan 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng
Boyu Gou
Jihyung Kil
Huan Sun
Yu-Chuan Su
MLLM
VLM
LLMAG
41
205
0
03 Jan 2024
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
  Empowers Large Language Models to Serve as Intelligent Agents
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Ke Yang
Jiateng Liu
John Wu
Chaoqi Yang
Yi Ren Fung
...
Xu Cao
Xingyao Wang
Yiquan Wang
Heng Ji
Chengxiang Zhai
LLMAG
ELM
18
71
0
01 Jan 2024
WebVLN: Vision-and-Language Navigation on Websites
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen
D. Pitawela
Chongyang Zhao
Gengze Zhou
Hsiang-Ting Chen
Qi Wu
29
8
0
25 Dec 2023
AppAgent: Multimodal Agents as Smartphone Users
AppAgent: Multimodal Agents as Smartphone Users
C. Zhang
Zhao Yang
Jiaxuan Liu
Yucheng Han
Xin Chen
Zebiao Huang
Bin-Bin Fu
Gang Yu
LM&Ro
LLMAG
19
77
0
21 Dec 2023
WebWISE: Web Interface Control and Sequential Exploration with Large
  Language Models
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
Heyi Tao
TV Sethuraman
Michal Shlapentokh-Rothman
Derek Hoiem
LLMAG
48
4
0
24 Oct 2023
AllTogether: Investigating the Efficacy of Spliced Prompt for Web Navigation using Large Language Models
Jiarun Liu
Wentao Hu
Chunhong Zhang
14
2
0
20 Oct 2023
Language Agent Tree Search Unifies Reasoning Acting and Planning in
  Language Models
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Xiaoxiao Sun
Yang Yang
Michal Shlapentokh-Rothman
Haohan Wang
Yu-xiong Wang
LRM
AI4CE
LM&Ro
LLMAG
34
183
0
06 Oct 2023
SteP: Stacked LLM Policies for Web Actions
SteP: Stacked LLM Policies for Web Actions
Paloma Sodhi
S. Branavan
Yoav Artzi
Ryan McDonald
LLMAG
22
26
0
05 Oct 2023
LASER: LLM Agent with State-Space Exploration for Web Navigation
LASER: LLM Agent with State-Space Exploration for Web Navigation
Kaixin Ma
Hongming Zhang
Hongwei Wang
Xiaoman Pan
Wenhao Yu
Dong Yu
LLMAG
19
39
0
15 Sep 2023
TPTU: Large Language Model-based AI Agents for Task Planning and Tool
  Usage
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
Jingqing Ruan
Yihong Chen
Bin Zhang
Zhiwei Xu
Tianpeng Bao
...
Shiwei Shi
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
LM&Ro
39
32
0
07 Aug 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
25
195
0
24 Jul 2023
Android in the Wild: A Large-Scale Dataset for Android Device Control
Android in the Wild: A Large-Scale Dataset for Android Device Control
Christopher Rawles
Alice Li
Daniel Rodriguez
Oriana Riva
Timothy Lillicrap
LM&Ro
16
137
0
19 Jul 2023
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer
  Control
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Longtao Zheng
R. Wang
Xinrun Wang
Bo An
LLMAG
17
57
0
13 Jun 2023
Understanding HTML with Large Language Models
Understanding HTML with Large Language Models
Izzeddin Gur
Ofir Nachum
Yingjie Miao
Mustafa Safdari
Austin Huang
Aakanksha Chowdhery
Sharan Narang
Noah Fiedel
Aleksandra Faust
AI4CE
134
70
0
08 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language
  Understanding
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
158
262
0
07 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,470
0
06 Oct 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
155
453
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
16
60
0
07 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
139
435
0
10 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
4,048
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
WebFormer: The Web-page Transformer for Structure Information Extraction
WebFormer: The Web-page Transformer for Structure Information Extraction
Qifan Wang
Yi Fang
Anirudh Ravula
Fuli Feng
Xiaojun Quan
Dongfang Liu
ViT
141
65
0
01 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
223
897
0
28 Apr 2021
FLIN: A Flexible Natural Language Interface for Web Navigation
FLIN: A Flexible Natural Language Interface for Web Navigation
Sahisnu Mazumder
Oriana Riva
LRM
43
23
0
24 Oct 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,583
0
18 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
Previous
12